
ADOCPHOENIX 
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Query Match 100.0%; Score 20; DB 9; Length 2072; 

Best Local Similarity 100.0%; Pred. No. 8.4; 

Matches 20; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 gcgtctctactgcctcttcg 20 

I I I I I I I I I I I I I I I I I I I I 
Db 162 GCGTCTCTACTGCCTCTTCG 181 



VMS- \%96 - $>\Pi9 



RESULT 1 

BC014484 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REMARK 
COMMENT 



FEATURES 

source 



CDS 



BC014484 1685 bp mRNA linear PRI 19-SEP-2001 

Homo sapiens, Similar to dystonia 1, torsion (autosomal dominant; 
torsin A), clone MGC:23205 IMAGE : 4869856, mRNA, complete cds . 
BC014484 

BC014484.1 GI: 15680257 
MGC . 
human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 1685) 
Strausberg, R. 
Direct Submission 

Submitted ( 17-SEP-2001 ) National Institutes of Health, Mammalian 
Gene Collection (MGC) , Cancer Genomics Office, National Cancer 
Institute, 31 Center Drive, Room 11A03, Bethesda, MD 20892-2590, 
USA 

NIH-MGC Project URL: http://mgc.nci.nih.gov 

Contact: MGC help desk 

Email: cgapbs-r@mail.nih.gov 

Tissue Procurement: ATCC/DCTD/DTP 

cDNA Library Preparation: Rubin Laboratory 

cDNA Library Arrayed by: The I.M.A.G.E. Consortium (LLNL) 

DNA Sequencing by: Genome Sequence Centre, 

BC Cancer Agency, Vancouver, BC, Canada 

inf o@bcgsc.bc. ca 

Steven Jones, Jennifer Asano, Ian Bosdet, Yaron Butterfield, 
Susanna Chan, Readman Chiu, Chris Fjell, Erin Garland, Ran Guin, 
Letticia Hsiao, Martin Krzywinski, Reta Kutsche, Oliver Lee, Soo 
Sen Lee, Victor Ling, Carrie Mathewson, Candice McLeavy, Steven 
Ness, Pawan Pandoh, Anna-Liisa Prabhu, Parvaneh Saeedi, Jacqueline 
Schein, Duane Smailus, Michael Smith, Lorraine Spence, Jeff Stott, 
Michael Thorne, Miranada Tsai, Natasja van den Bosch, Jill Vardy, 
George Yang, Scott Zuyderduyn, Marco Marra. 

Clone distribution: MGC clone distribution information can be found 
through the I.M.A.G.E. Consortium/LLNL at: http://image.llnl.gov 
Series: IRAL Plate: 34 Row: 1 Column: 17 

This clone was selected for full length sequencing because it 
passed the following selection criteria: Similarity but not 
identity to protein. 

Location/Qualifiers 
1. .1685 

/organism="Homo sapiens" 
/db_xref="taxon: 9606" 
/clone="MGC:23205 IMAGE: 4869856" 

/tissue_type="Skin, melanotic melanoma, high MDR. " 
/ clone_lib-"NIH_MGC_4 9 " 
/lab_host="DH10B-R M 
/note="Vector: pOTB7" 
20. .613 - 
/codon_start=l 

/product="Similar to dystonia 1, torsion (autosomal 
dominant; torsin A) " 
/protein_id="AAH14484 . 1" 



/db_xref="GI : 15680258" 

/translation="MKLGRAVLGLLLLAPSWQAVEPISLGLALAGVLTGYIYPRLYC 
LFAECCGQKRSLSREALQKDLDDNLFGQHLAKKIILNAVFGFINNPKPKKPLTLSLHG 
WTGTGKNFVSKIIAENIYEGGLNSDYVHLFVATLHFPHASNITLYKARMEVWNPFLDV 
IGFGVSLLWDEIWEFYVEMSEPGKRFMSQFPLERCRS" 

BASE COUNT 379 a 421 c 425 g 460 t 

ORIGIN 



Query Match 98.0%; Score 392; DB 9; Length 1685; 

Best Local Similarity 98.8%; Pred. No. 2.5e-117; 

395; Conservative 0; Mismatches 5; Indels 0; Gaps 0; 

gaatatttacgagggtggtctgaacagtgactatgtccacctgtttgtggccacattgct 60 
I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I M I I I I I I I I I I I I I I I I M I M I I I 



II M I I I II I I I II I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I 



I I I I I I I I I I I I I I I I I I I M I I I I I I I I II I I I I I I I I I II II I I I I I I I I I I I I I I I 



I | II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I 



I I I I I I I I I I I I I I I I I I I I I II I I I I I I I M I I I I I I I I I I II I M I I I I I I I I I I I I 



I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I II 
AAT C GGT C CAGT GAGT AT GT AGGGT CAT G G GAT T T T AGAGGT G GACAT GAT CAAAT CCAT 729 

cttagagatcaacacatctcactcatttttattttcttat 400 +■ + 4£>1 ^ 

I I I I II I I I I II M I I I I I I I I I I I I I I I I I II II I 'ml 
CT T AGAGAT CAACACAT CT CACT CAT T T T T T TAT t T TT T T 769 ^ f Ml 



Matches 


Qy 


1 


Db 


370 


Qy 


61 


Db 


430 


Qy 


121 


Db 


490 


Qy 


181 


Db 


550 


Qy 


241 


Db 


610 


Qy 


301 


Db 


670 


Qy 


361 


Db 


730 



RESULT 2 

AC027008 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 



AC027008 



166889 bp 



DNA 



linear HTG 08-NOV-2000 



Homo sapiens chromosome 8 clone RP11-212N14 map 8, WORKING DRAFT 

SEQUENCE, 26 unordered pieces. 

AC027008 

AC027 008.4 GI: 102 80898 

HTG; HTGS_PHASE1; HTGS_DRAFT. 

human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 166889) 

Birren,B., Linton, L., Nusbaum,C. and Lander, E. 
Homo sapiens chromosome 8, clone RP11-212N14 



JOURNAL Unpublished 
REFERENCE 2 (bases 1 to 166889) 

AUTHORS Birren,B., Linton, L. , Nusbaurn, C . , Lander, E., Abraham, H . , Allen, N., 
Anderson, S., Baldwin, J., Barna,N., Bastien,V., Beda,F., 
Boguslavkiy, L . , Boukhgalter , B . , Brown, A., Burkett,G., 
Campopiano, A. , Castle, A., Choepel,Y., Colangelo, M. , Collins, S., 
Collymore, A. , Cooke, P., DeArellano, K. , Dewar,K., Diaz, J. S., 
Dodge, S., Domino, M. , Doyle, M. , Ferreira,P., FitzHugh,W., Gage,D., 
Galagan,J., Gardyna,S., Ginde,S., Goyette,M., Graham, L., 
Grand-Pierre, N. , Grant, G., Hagos,B., Heaf ord,A. , Horton,L., 
Howland, J. C . , Iliev, I., Johnson, R. , Jones, C, Kann,L., Karatas,A., 
Klein, J., LaRocque,K., Lamazares , R. , Landers, T., Lehoczky,J., 
Levine,R., Lieu,C, Liu,G., Locke, K., Macdonald, P . , Marquis, N., 
McCarthy, M. , McEwan,P., McGurk,A. , McKernan,K., McPheeters , R. , 
Meldrim, J., Meneus,L., Mihova,T., Miranda, C, Mlenga,V., Morrow, J. 
Murphy,T., Naylor,J., Norman, C.H., 0'Connor,T., 0 * Donnell, P . , 
0'Neil,D., 01ivar,T.M., Oliver, J., Peterson, K., Pierre, N., 
Pisani,C, Pollara,V., Raymond, C, Riley, R. , Rogov, P., Rothman,D., 
Roy, A., Santos, R. , Schauer,S., Severy,P., Spencer, B., 
Stange-Thomann, N. , Sto j anovic, N . , Subramanian, A. , Talamas,J., 
Tesfaye,S., Theodore, J., Tirrell,A., Travers,M., Trigilio,J., 
Vassiliev,H. , Viel,R., Vo,A. , Wilson, B., Wu,X., Wyman,D., Ye,W.J., 
Young, G., Zainoun,J., Zimmer,A. and Zody,M. 

TITLE Direct Submission 

JOURNAL Submitted (25-MAR-2000) Whitehead Institute/MIT Center for Genome 
Research, 320 Charles Street, Cambridge, MA 02141, USA 
COMMENT On Sep 23, 2000 this sequence version replaced gi: 8080819. 

All repeats were identified using RepeatMasker : 
Smit, A.F.A. & Green, P. (1996-1997) 

http : / / ftp . genome .Washington. edu/RM/RepeatMasker . html 
Genome Center 

Center: Whitehead Institute/ MIT Center for Genome Research 

Center code: WIBR 

Web site: http://www-seq.wi.mit.edu 

Contact : sequence_submissions @genome . wi .mit . edu 

Project Information 

Center project name: L8771 
Center clone name: 212_N_14 

Summary Statistics 

Sequencing vector: M13; M77815; 100% of reads 
Chemistry: Dye-terminator Big Dye; 100% of reads 
Assembly program: Phrap; version 0.960731 
Consensus quality: 150653 bases at least Q40 
Consensus quality: 158747 bases at least Q30 
Consensus quality: 161880 bases at least Q20 
Insert size: 186000; agarose-fp 
Insert size: 164389; sum-of-contigs 
Quality coverage: 3.6 in Q20 bases; agarose-fp 
Quality coverage: 4.0 in Q20 bases; sum-of-contigs 



NOTE: This is a 'working draft 1 sequence. It currently 
consists of 26 contigs . The true order of the pieces 
is not known and their order in this sequence record is 
arbitrary. Gaps between the contigs are represented as 
runs of N, but the exact sizes of the gaps are unknown. 
This record will be updated with the finished sequence 
as soon as it is available and the accession number will 



be preserved. 
1 

141 
241 
1567 
1667 

26280 

26380 

27677 

27777 

29821 

29921 

33217 

33317 

36628 

36728 

39383 

39483 

42418 

42518 

46307 

46407 

50208 

50308 

53364 

53464 

56761 

56861 

61208 

61308 

65985 

66085 

72073 

72173 

77742 

77842 

85851 

85951 

92903 

93003 
103669 
103769 
109323 
109423 
118527 
118627 
128875 
128975 
138017 
138117 
166501 
166601 



140: contig of 140 bp in length 
240: gap of 100 bp 

1566: contig of 1326 bp in length 
1666: gap of 100 bp 

26279: contig of 24613 bp in length 
26379: gap of 100 bp 

27676: contig of 1297 bp in length 
27776: gap of 100 bp 

29820: contig of 2044 bp in length 
29920: gap of 100 bp 

33216: contig of 3296 bp in length 
33316: gap of 100 bp 

36627: contig of 3311 bp in length 
36727: gap of 100 bp 

39382: contig of 2655 bp in length 
39482: gap of 100 bp 

42417: contig of 2935 bp in length 
42517: gap of 100 bp 

46306: contig of 3789 bp in length 
46406: gap of 100 bp 

50207: contig of 3801 bp in length 
50307: gap of 100 bp 

53363: contig of 3056 bp in length 
53463: gap of 100 bp 

56760: contig of 3297 bp in length 
56860: gap of 100 bp 

61207: contig of 4347 bp in length 
61307: gap of 100 bp 

65984: contig of 4677 bp in length 
66084: gap of 100 bp 

72072: contig of 5988 bp in length 
72172: gap of 100 bp 

77741: contig of 5569 bp in length 
77841: gap of 100 bp 

85850: contig of 8009 bp in length 
85950: gap of 100 bp 

92902: contig of 6952 bp in length 
93002: gap of 100 bp 

103668: contig of 10666 bp in length 
103768: gap of 100 bp 

109322: contig of 5554 bp in length 
109422: gap of 100 bp 

118526: contig of 9104 bp in length 
118626: gap of 100 bp 

128874: contig of 10248 bp in length 



FEATURES 

source 



128974: gap of 

138016: contig of 
138116: gap of 

166500: contig of 
166600: gap of 

166889: contig of 
Location/ Qualifiers 
1. .166889 

/organism- "Homo sapiens" 
/db_xref ="taxon : 9606" 
/ ch r omo s ome= " 8 " 



100 bp 

9042 bp in length 
100 bp 

28384 bp in length 
100 bp 

289 bp in length. 




/map="8 M 

/clone="RPll-212N14" 

/clone lib="RPCI-ll Human Male BAG" 



misc 


feature 


1. .140 








/ note= 


"assembly 


fragment 






clone ■ 


end: SP6 








vector 


side : left 


it 


misc 


feature 


241. 


1566 








/note= 


"assembly 


fragment' 


misc 


feature 


1667. 


.26279 








/note= 


"assembly 


fragment' 


misc 


feature 


26380. 


.27676 








/ note= 


"assembly 


fragment' 


misc 


feature 


27777. 


.29820 








/note= 


"assembly 


fragment' 


misc 


feature 


29921. 


.33216 








/note= 


"assembly 


fragment' 


misc 


feature 


33317. 


.36627 








/note= 


"assembly 


fragment' 


misc 


feature 


36728. 


.39382 








/ note= 


"assembly 


fragment' 


misc 


feature 


39483. 


.42417 








/ note= 


"assembly 


fragment' 


misc 


feature 


42518. 


.46306 








/ note= 


"assembly 


fragment 


misc 


feature 


46407. 


.50207 








/ note= 


"assembly 


fragment 


misc 


feature 


50308. 


.53363 








/ note= 


"assembly 


fragment 


misc 


feature 


53464. 


.56760 








/note= 


"assembly 


fragment 


misc 


feature 


56861. 


. 61207 








/ note= 


"assembly 


fragment 


misc 


feature 


61308. 


.65984 








/note= 


"assembly 


fragment 


misc 


feature 


66085. 


.72072 








/note= 


"assembly 


fragment 


misc 


feature 


72173. 


.77741 








/ note= 


"assembly 


fragment 


misc 


feature 


77842. 


. 85850 








/ note- 


"assembly 


fragment 


misc 


feature 


85951. 


.92902 








/ note= 


"assembly 


fragment 


misc 


feature 


93003. 


. 103668 








/ note= 


"assembly 


fragment 


misc 


feature 


103769 


. .109322 








/'note= 


"assembly 


fragment 


misc 


feature 


109423 


. .118526 








/ note= 


"assembly 


fragment 


misc 


feature 


118627 


. .128874 








/note= 


"assembly 


fragment 


misc 


feature 


128975 


. .138016 








/ note= 


"assembly 


fragment 


misc 


feature 


138117 


. .166500 








/note= 


"assembly 


f ragment 


misc 


feature 


166601 


. .166889 








/note= 


"assembly 


fragment 



clone_end: T7 

vector_side: right" 
BASE COUNT 43782 a 38337 c 39037 g 43225 t 2508 others 
ORIGIN 



Query Match 98.0%; Score 392; DB 2; Length 166889; 

Best Local Similarity 98.8%; Pred. No. 4.3e-117; 

Matches 395; Conservative 0; Mismatches 5; Indels 0; Gaps 0; 

Qy 1 gaatatttacgagggtggtctgaacagtgactatgtccacctgtttgtggccacattgct 60 

I I I I I I I I I I I I I I I I I I M I I I I I I I II I I I I I I I M I I I I I I I I I I I I I I M M I I I 
Db 117201 GAAT AT T T AC GAGGGT GGT C T GAACAGT GACT AT GT C CAC CT GT TT GT GGC CACAT T GCA 117260 

Qy 61 ctttccacatgcttcaaacatcaccttgtacaaggcaaggatggaagtttggaatccctt 120 

I M M M I I I I II I I I I I I I I I I I I I I i I I I I I I I II I I I I I I I I I I I I I I I I M I I I I I 
Db 117261 CT T T C CACAT GCT T CAAACAT CAC CT T GTACAAGG CAAGGAT GGAAGT T T G GAAT C C CT T 117320 

Qy 121 cctggatgtcatcgggtttggggtctctttgttgtgggatgagatttgggagttctatgt 180 

M M I I I M M II I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I II I I I I 
Db 117321 CCTGGATGTCATCGGGTTTGGGGTCTCTTTGTTGTGGGATGAGATTTGGGAGTTCTATGT 11738 0 

Qy 181 tgaaatgagtgagcccggaaaacggttcatgtctcagttccccttggaaaggtgtagaag 240 

I I I I II II I I I I II I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 
Db 117381 T GAAAT GAGT GAGC C CG GAAAAC GGTT CAT GT CT CAGT T C C C CT T GGAAAGGT GTAGAAG 117440 

Qy 241 ttaagagtttgagatgcgtggagcagttaataccatcaaagctttgtggtgggttctgaa 300 

I I I I I I I I I I I I I I I I I I I M I I I I I I I I M I I I I I I I M I I I I I I I I I I I I I I I II I I I 
Db 117441 TT AAGAGT T T GAGAT GC GT G GAG CAGT TAATAC CAT CAAAGCT T T GT GGT GGGT T CT GAA 117500 

Qy 301 aatcggtccagtgagtatgtagggtcatgggattttagaggtggacatgatcaaatccat 360 

I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I M II I I I I I I I I I I I I I I I I I I I 
Db 117501 AAT C GGT C CAGT GAGTAT GTAG G GT CAT G G GAT T T T AGAG GT GGACAT GAT CAAAT C CAT 117560 

Qy 361 cttagagatcaacacatctcactcatttttattttcttat 400 T T 

I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II II I r I 

Db 117561 C T T AGAGAT C AAC AC AT C T C A C T CAT TTTTTTATTTTTTT 117600 U T 

RESULT 3 SIP 52. 
AL158207/C 

LOCUS AL158207 169963 bp DNA linear PRI 25-SEP-2001 

DEFINITION Human DNA sequence from clone RP11-409K20 on chromosome 9 Contains 
the TOR1B gene for torsin family 1 member B (torsin B) (DQ1) , the 
DYT1 gene for 1 dystonia 1, torsion' (autosomal dominant; torsin A) 
(DQ2, TOR1A) , the gene for hepatocellular carcinoma-associated 
antigen 59 (HSPC220, LOC51759) , the USP20 gene for ubiquitin 
specific protease 20 (KIAA1003) , and the gene for f ormin-binding 
protein 17 ( FBP17 , includes KIAA0554, FLJ13619, FLJ10754 and 
FLJ10113) . Contains ESTs, STSs, GSSs and four CpG islands, complete 
sequence . 
ACCESSION AL158207 

VERSION AL158207.15 GI: 12717949 

KEYWORDS HTG; CpG island; DQ1; dystonia; DYT1; FBP17; FLJ10113; FLJ10754; 

FLJ13619; f ormin-binding; hepatocellular; HSPC220; KIAA0554; 

KIAA1003; LOC51759; protease; TOR1B; torsin; ubiquitin; USP20. 
SOURCE human . 



ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



Chordata; Craniata; Vertebrata; Euteleostomi ; 
Primates; Catarrhini; Hominidae; Homo. 



FEATURES 

source 



repeat__region 
misc_f eature 
repeat_region 
misc_f eature 
misc_f eature 
rnisc feature 



Homo sapiens 
Eukaryota; Metazoa; 
Mammalia; Eutheria; 
1 (bases 1 to 169963) 
Babbage, A. 
Direct Submission 

Submitted (25-SEP-2001 ) Sanger Centre, Hinxton, Cambridgeshire, 
CB10 ISA, UK. E-mail enquiries: humquery@sanger.ac.uk Clone 
requests : cl one request @s anger . ac . uk 

On Feb 8, 2001 this sequence version replaced gi: 12657099. 
During sequence assembly data is compared from overlapping clones. 
Where differences are found these are annotated as variations 
together with a note of the overlapping clone name. Note that the 
variation annotation may not be found in the sequence submission 
corresponding to the overlapping clone, as we submit sequences with 
only a small overlap as described above. 

This sequence was finished as follows unless otherwise noted: all 
regions were either double-stranded or sequenced with an alternate 
chemistry or covered by high quality data (i.e., phred quality >= 
30) ; an attempt was made to resolve all sequencing problems, such 
as compressions and repeats; all regions were covered by at least 
one plasmid subclone or more than one M13 subclone; and the 
assembly was confirmed by restriction digest. The following 
abbreviations are used to associate primary accession numbers given 
in the feature table with their source databases: Em:, EMBL; Sw: , 
SWISSPROT; Tr:, TREMBL; Wp : , WORMPEP; Information on the WORMPEP 
database can be found at 

http : //www. sanger . ac.uk/Projects/C_elegans/wormpep This sequence 
was generated from part of bacterial clone contigs of human 
chromosome 9, constructed by the Sanger Centre Chromosome 9 Mapping 
Group. Further information can be found at 
http : //www. sanger . ac . uk/HGP/Chr9 

RP11-409K20 is from the library RPCI-11.2 constructed by the group 
of Pieter de Jong. For further details see 
http : //www. chori . org/bacpac/home . htm 
VECTOR: pBACe3.6 

This sequence is the entire insert 
left end of clone RP11-138E2 is at 

Location/ Qualifiers 

1. .169963 

/organism= n Homo sapiens" 
/db_xref="taxon:9606 n 
/ chr omo s ome= " 9 " 
/ cl one= " RP 1 1 - 4 0 9K2 0 " 
/clone_lib="RPCI-ll . 2" 
5. .86 

/note="MSTC repeat: matches 46. 
28. .462 

/note="match: GSS: Em:AQ718881" 
817. .992 

/note="Charlie2 repeat: matches 7, 
complement (2510. .2941) 
/note="match: GSS: Em: AQ041615" 
2944. .3096 

/note="match: GSS: Em:B74700" 
3329. .4807 
/note="CpG island" 



of clone RP11-409K20 The true 
118932 in this sequence. 



,126 of consensus" 



,195 of consensus' 




/ evidence=no t_experimental 
mRNA join(4205. .4464,5126. .5391,8241. .8416,9958. .10085, 

10395. .12334) 
/gene="TORlB" 

/note="match: cDNAs : Em: AF007872 Em:AJ297743 
match: ESTs : Em:AI815528 Em:AW160403 Em:AW972065 
Em:BF313148 Em:AA112625 Em:BE740991 Em:BE563034 
Em:BE893335 Em:AV728123 Em:AW952051 Em:AW148938 
Em:BE502754 Em:AW016676 Em:AI223067 Em:BE108689 
Em:AI808893 Em:AW173267 Em : AI 18 5 2 4 7 11 

/product="bA409K20.1.1 (torsin family 1, member B {torsin 
B) (DQ1 ) ) " 

/evidence=not_experimental 
gene 4205. .12334 

/gene= n TORlB" 

CDS join(4266. .4464,5126. .5391,8241. .8416,9958. .10085, 

10395. .10636) 
/gene="TORlB" 

/note="match: proteins: Tr:014657" 

/ codon_start=l 

/ evidence=not_experimental 

/product="bA409K20. 1. 1 (torsin family 1, member B (torsin 
B) (DQ1) ) " 

/protein_id="CAC88165. 1" 
/db_xref="GI: 15787707" 

/trans la tion="MLRAGWLRGAAAIALLLAARWAAFEPITVG LAI GAASAITGYL 
SYNDIYCRFAECCREERPLNASALKLDLEEKLFGQHLATEVIFKALTGFRNNKNPKKP 
LTLSLHGWAGTGKNFVSQIVAENLHPKGLKSNFVHLFVSTLHFPHEQKIKLYQDQLQK 
WIRGNVSACANSVFI FDEMDKLHPGI IDAIKPFLDYYEQVDGVS YRKAI FI FLSNAGG 
DLITKTALDFWRAGRKREDIQLKDLEPVLSVGVFNNKHSGLWHSGLIDKNLIDYFIPF 
LPLEYRHVKMCVRAEMRARGSAIDEDIVTRVAEEMTFFPRDEKIYSDKGCKTVQSRLD 
FH" 

mRNA join(4321. .4464,5126. .5391,11280. .11571) 

/gene="TORlB" 
/note="isof orm 3 

match: ESTs: Em:BF058863 Em: BE315222 " 

/product="bA409K20.1.3 (torsin family 1, member B (torsin 
B) (DQ1), putative isoform 3)" 
/evidence=not_experimental 
CDS join(<4321. .4464,5126. .5391,11280. .11294) 

/gene="TORlB" 
/codon_start=3 
/evidence=not__experimental 

/product= fI bA409K20.1.3 (torsin family 1, member B (torsin 
B) (DQ1) , putative isoform 3)" 
/protein_id= n CAC88166. 1" 
/db_xref="GI: 15787708" 

/translation="RWAAFEPITVGLAIGAASAITGYLSYNDIYCRFAECCREERPL 
NASALKLDLEEKLFGQHLATEVIFKALTGFRNNKNPKKPLTLSLHGWAGTGKNFVSQI 
VAENLHPKGLKSNFVHLFVSTLHFPHEQKIKLYQSSLT" 
mRNA join(5126. .5266,8241. .8416,9958. .10085,10395. .10456) 

/gene="TORlB" 

/note= l! match: ESTs: Em:AI468027" 
/product="bA409K20. 1.2 (isoform 2)" 
/evidence=not_experimental 
mRNA join(5159. .5391,8241. .8416,11280. .11319) 

/gene="TORlB" 



CDS 



misc_f eature 

repeat_region 

mis c_f eature 

mis c_f eature 

polyA_signal 

poiyA_site 

polyA_site 

mRNA 



gene 

polyA_signal 
misc_f eature 
misc_f eature 

mis c_f eature 

misc feature 



/note="isof orm 4 

match: ESTs : Em:AI568476 n 

/product="bA409K20.1.4 (torsin family 1, member B (torsin 
B) (DQ1) , putative isoform 4)" 
/ evidence=not_experimental 

join(<5159. .5391,8241. .8416,11280. .11289) 

/gene="TORlB" 

/ codon_start=3 

/evidence=not_experimental 

/product= fl bA409K20. 1. 4 (torsin family 1, member B (torsin 
B) (DQ1), putative isoform 4)" 
/protein_id="CAC88167 . 1" 
/db_xref="GI : 15787709" 

/ trans la tion="QHIATEVIFKALTGFRNNKNPKKPLTLSLHGWAGTGKNFVSQIV 

AENLHPKGLKSNEVHLFVSTLHFPHEQKIKLYQDQLQKWIRGNVSACANSVFIFDEMD 

KLHPGI I DAI KP FLD YYEQVDGVS YRKAI FI FLRVH " 

5188. .5526 

/gene="TORlB" 

/note="match: STS: Em:G24606" 
7370. .7432 

/ note="MER61E repeat: matches 128. .190 of consensus" 

complement (11923 . . 12334 ) 

/note="match: STS: Em:G27406" 

complement (12097 . . 12334) 

/note="match: STS: Em:G24725" 

12313. .12318 

/gene="TORlB" 

12334 

/gene="TORlB" 
complement ( 13 9 97 ) 
/gene="DYTl" 

complement (join (13997 . . 15275, 19573. 
23634. .23899,24961. .25180)) 
/gene="DYTl" 

/note="match: cDNAs : Em:AF007871 

match: ESTs: Em:BE272533 Em:BE314317 Em:BE784377 
Em:BF203163 Em:BE622540 Em:AI039978 Em:AI770117 
Em:AW050630 Em:AI970719 Em:BE463967 Em:AI374678 
Em:AI167967 Em:AI127274 Em:AI699731 Em:AW001722 
Em:AI301894 Em:AW080988" 

/product="bA409K20 . 2 (dystonia 1, torsion (autosomal 
dominant; torsin A) (DQ2, TOR1A) ) " 
/evidence=not_experimental 
complement (13997 . . 25180) 
/gene="DYTl" 
complement (14010 . 
/gene="DYTl n 
14016. .14298 
/note="match: STS: 
complement (14429. 
/gene="DYTl" 
/note="mat ch : GSS : 
complement (14 4 69. 
/gene="DYTl" 
/note="match: GSS : 
complement ( 14494 . 
/gene="DYTl" 



.19700,19798. .19973, 



,14015) 



Em:G30092" 
.14885) 

Em:B69651" 
.14876) 

Em:B48142" 
.14860) 



/note="match: GSS: Em: AQ566167" 
polyA_site complement ( 14 632 ) 

/gene= H DYTl" 
misc__feature 14650. .15099 

/note="match: STS : Em:G60041 Em:G60042" 
misc_feature 14807. .14914 

/note="match: STS: Em:G43378 Em:G43379" 
misc_feature 14885. .15212 

/note="match: GSS: Em: AQ213491" 
misc_feature 14890. .15392 

/note="match: GSS: Em:AQ482600" 
CDS complement (join (15025. .15275,19573. .197 00,19798. .19973, 

23634. .23899,24961. .25138)) 

Query Match 98.0%; Score 392; DB 9; Length 169963; 

Best Local Similarity 98.8%; Preci. No. 4.3e-117; 

Matches 395; Conservative 0; Mismatches 5; Indels 0; Gaps 0; 



I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I II I M M 



ctttccacatgcttcaaacatcaccttgtacaaggcaaggatggaagtttggaatccctt 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I M M I I I I I I I I I I I I I I I I I II I 
CTTTCCACATGCTTCAAACATCACCTTGTACAAGGCAAGGATGGAAGTTTGGAATCCCTT 23608 

cctggatgtcatcgggtttggggtctctttgttgtgggatgagatttgggagttctatgt 180 
I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I 
CCTGGATGTCATCGGGTTTGGGGTCTCTTTGTTGTGGGATGAGATTTGGGAGTTCTATGT 23548 

tgaaatgagtgagcccggaaaacggttcatgtctcagttccccttggaaaggtgtagaag 240 
I I I I I I I I II M I II I I I I I I I I I I I I I II I I I I I I I I I I I M I I I I I I I I I I I I I I I I I 
T GAAAT GAGT GAGC C C G GAAAAC GGTT CAT GT CT CAGT T C CC CT T GGAAAGGT GT AGAAG 23488 



I M I I M I I II I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I ! I I 



I I I I I I M I I I I I II I II I I I I I I I I I I I M I I I I I I I I I M I I I I I I I I I I I I I I I I M 

AAT C GGT C CAGT GAG TAT GT AGGGT CAT G G GAT T T TAGAG GT G GACAT GAT CAAAT C CAT 23368 
cttagagatcaacacatctcactcatttttattttcttat 400 X T . 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 m i 1 1 1 1 i 5r a ?^ 

CT T AGAGAT CAACACAT CT CACT CAT T T T T T TAT T T T T T T 2332 8 c T 

Sit) fia 10* i~J »vw 19353 - 13670 ^ ' 

5lp : si** **** '* 533 ~ ' 5Z3G 



Qy 


1 


Db 


23727 


Qy 


61 


Db 


23667 


Qy 


121 


Db 


23607 


Qy 


181 


Db 


23547 


Qy 


241 


Db 


23487 


Qy 


301 


Db 


23427 


Qy 


361 


Db 


23367 
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AL158207 

LOCUS 

DEFINITION 



ACCESSION 

VERSION 

KEYWORDS 



SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



Craniata; Vertebrata; Euteleostomi ; 
Catarrhini; Hominidae; Homo. 



FEATURES 



AL158207 169963 bp DNA linear PRI 25-SEP-2001 

Human DNA sequence from clone RP11-409K20 on chromosome 9 Contains 
the TOR1B gene for torsin family 1 member B (torsin B) {DQ1) , the 
DYT1 gene for 'dystonia 1, torsion* (autosomal dominant; torsin A) 
(DQ2, TOR1A) , the gene for hepatocellular carcinoma-associated 
antigen 59 (HSPC220, LOC51759) , the USP20 gene for ubiquitin 
specific protease 20 (KIAA1003) , and the gene for f ormin-binding 
protein 17 (FBP17, includes KIAA0554, FLJ13619, FLJ10754 and 
FLJ10113) . Contains ESTs, STSs, GSSs and four CpG islands, complete 
sequence . 
AL158207 

AL158207. 15 GI : 12717949 

HTG; CpG island; DQ1; dystonia; DYT1; FBP17; FLJ10113; FLJ10754; 
FLJ13619; f ormin-binding; hepatocellular; HSPC220; KIAA0554; 
KIAA1003; LOC51759; protease; TOR1B; torsin; ubiquitin; USP20. 
human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Primates; 
1 (bases 1 to 169963) 
Babbage, A. 
Direct Submission 

Submitted ( 25-SEP-2001 ) Sanger Centre, Hinxton, Cambridgeshire, 
CB10 ISA, UK. E-mail enquiries: humquery@sanger.ac.uk Clone 
requests : clonereques t@sanger .ac.uk 

On Feb 8, 2001 this sequence version replaced gi:12657099. 
During sequence assembly data is compared from overlapping clones. 
Where differences are found these are annotated as variations 
together with a note of the overlapping clone name. Note that the 
variation annotation may not be found in the sequence submission 
corresponding to the overlapping clone, as we submit sequences with 
only a small overlap as described above. 

This sequence was finished as follows unless otherwise noted: all 
regions were either double-stranded or sequenced with an alternate 
chemistry or covered by high quality data (i.e., phred quality >= 
30) ; an attempt was made to resolve all sequencing problems, such 
as compressions and repeats; all regions were covered by at least 
one plasmid subclone or more than one M13 subclone; and the 
assembly was confirmed by restriction digest. The following 
abbreviations are used to associate primary accession numbers given 
in the feature table with their source databases: Em:, EMBL; Sw: , 
SWISSPROT; Tr:, TREMBL; Wp : , WORMPEP; Information on the WORMPEP 
database can be found at 

http://www.sanger.ac.uk/Projects/C_elegans/wormpep This sequence 
was generated from part of bacterial clone contigs of human 
chromosome 9, constructed by the Sanger Centre Chromosome 9 Mapping 
Group. Further information can be found at 
http : //www. sanger . ac.uk/HGP/ Chr9 

RP11-409K20 is from the library RPCI-11.2 constructed by the group 
of Pieter de Jong. .For further details see 
http : //www. chori . org/bacpac/home . htm 
VECTOR: pBACe3 . 6 

This sequence is the entire insert of clone RP11-409K20 The true 
left end of clone RP11-138E2 is at 118932 in this sequence. 
Location/Qualifiers 




source 



repeat_region 
misc_f eature 
repeat_region 
mi sc_f eature 
misc_f eature 
misc_f eature 

mRNA 



gene 
CDS 



mRNA 



,126 of consensus" 



,195 of consensus" 



1. .169963 

/organism="Homo sapiens" 
/db_xref="taxon: 9606" 
/ chromos ome=" 9 " 
/clone="RPll-409K20" 
/clone__lib="RPCI-ll .2" 
5. .86 

/note="MSTC repeat: matches 46. 
28. .462 

/note="match: GSS: Em:AQ718881" 
817. .992 

/note="Charlie2 repeat: matches 7, 
complement (2510. .2941) 
/note="match: GSS: Em:AQ041615" 
2944. .3096 

/note="match: GSS: Em:B74700" 
3329. .4807 
/note="CpG island" 
/evidence=not_experimental 

join(4205. .4464,5126. .5391,8241. .8416,9958. .10085, 

10395. .12334) 

/gene="TORlB" 

/note="match: cDNAs : Em:AF007872 Em:AJ297743 
match: ESTs : Em:AI815528 Em:AW160403 Em:AW972065 
Em:BF313148 Em:AA112625 Em:BE740991 Em:BE563034 
Em:BE893335 Em:AV728123 Em:AW952051 Em:AW148938 
Em:BE502754 Em:AW016676 Em:AI223067 Em:BE108689 
Em:Al808893 Em:AW173267 Em:AI185247" 

/product="bA409K20.1.1 (torsin family 1, member B (torsin 
B) (DQ1) ) " 

/evidence=not_experimental 
4205. .12334 
/gene="TORlB" 

join(4266. .4464,5126. .5391,8241. .8416,9958. .10085, 

10395. .10636) 

/gene="TORlB" 

/note="match: proteins: Tr:014657" 

/codon_start=l 

/evidence=not_experimental 

/product="bA409K20. 1 . 1 (torsin family 1, member B (torsin 
B) (DQ1) ) " 

/protein_id="CAC88165.1" 
/db_xref="GI:15787707" 

/trans la tion="MLRAGWLRGAAAIALLLAARWAAFEPITVG LAI GAASAITGYL 
SYNDIYCRFAECCREERPLNASALKLDLEEKLFGQHLATEVIFKALTGFRNNKNPKKP 
LTLSLHGWAGTGKNFVSQIVAENLHPKGLKSNFVHLFVSTLHFPHEQKIKLYQDQLQK 
WI RGNVSACANSVFI FDEMDKLHPGI I DAI KPFLDYYEQVDGVS YRKAI FI FLSNAGG 
DLITKTALDFWRAGRKREDIQLKDLEPVLSVGVFNNKHSGLWHSGLIDKNLIDYFIPF 
LPLEYRHVKMCVRAEMRARGSAIDEDIVTRVAEEMTFFPRDEKIYSDKGCKTVQSRLD 
FH" 

join(4321. .4464,5126. .5391,11280. .11571) 

/gene="TORlB" 

/note="isof orm 3 

match: ESTs: Em:BF058863 Em:BE315222" 

/product="bA409K20. 1 . 3 (torsin family 1, member B (torsin 
B) (DQ1), putative isoform 3)" 
/evidence=not_experimental 



CDS 



mRNA 



mRNA 



CDS 



misc_f eature 

repeat_region 

misc_f eature 

mis c_f eature 

polyA_signal 

polyA_site 

polyA_site 

mRNA 



join(<4321. .4464,5126. .5391,11280. .11294) 

/gene="TORlB" 

/ codon_start=3 

/evidence=not_experimental 

/product="bA409K20 . 1. 3 (torsin family 1, member B (torsin 
B) (DQ1), putative isoform 3)" 
/protein_id="CAC88166. 1" 
/db_xref="GI: 15787708" 

/ trans la tion="RWAAFEPITVGLAIGAASAITGYLSYNDIYCRFAECCREERPL 
NASALKLDLEEKLFGQHLATEVIFKALTGFRNNKNPKKPLTLSLHGWAGTGKNFVSQI 
VAENLHPKGLKSNFVHLEVSTLHFPHEQKIKLYQSSLT" 

join(5126. .5266,8241. .8416,9958. .10085,10395. .10456) 
/gene= n TORlB" 

/note="rnatch: ESTs : Em: AI468027 " 
/product="bA409K20.1.2 (isoform 2)" 
/evidence=not_experimental 

join(5159. .5391,8241. .8416,11280. .11319) 

/gene="TORlB" 

/note="isof orm 4 

match: ESTs: Em: AI568476" 

/product="bA409K20.1.4 (torsin family 1, member B (torsin 
B) (DQ1) , putative isoform 4)" 
/evidence=not_experimental 

join(<5159. .5391,8241. .8416,11280. .11289) 

/gene="TORlB" 

/ codon_start=3 

/evidence=not_experimental 

/product="bA409K20.1.4 (torsin family 1, member B (torsin 
B) (DQ1), putative isoform 4)" 
/protein_id="CAC88167.1" 
/db_xref="GI : 15787709" 

/translation="QHLATEVIFKALTGFRNNKNPKKPLTLSLHGWAGTGKNFVSQIV 

AENLHPKGLKSNFVHLFVSTLHFPHEQKIKLYQDQLQKWIRGNVSACANSVFIFDEMD 

KLHPGI I DAI KPFLDYYEQVDGVS YRKAI FI FLRVH " 

5188. .5526 

/gene="TORlB" 

/note="match: STS : Em:G24606" 
7370. .7432 

/ note="MER61E repeat: matches 128. 
complement (11923. . 12334) 

Em:G27406" 
. 12334) 
Em:G24725" 



,190 of consensus" 



/note="match: STS 
complement ( 12097 . 
/note="match: STS 
12313. .12318 
/gene= n TORlB" 
12334 

/gene="TORlB" 
complement (13997) 
/gene="DYTl" 
complement (join (13997 
23634. .23899,24961. 
/gene="DYTl" 

/note="match: cDNAs : Em:AF007871 

match: ESTs: Em:BE272533 Em:BE314317 Em:BE784377 
Em:BF203163 Em:BE622540 Em:AI039978 Em:AI770117 
Em:AW050630 Em:AI970719 Em:BE463967 Em:AI374678 
Em:AI167967 Em:AI127274 Em:AI699731 Em:AW001722 



. 15275, 19573. . 19700, 197 98 . . 19973, 
25180) ) 



gene 

polyA_signal 
misc_f eature 
misc_f eature 

misc_f eature 

misc_f eature 

polyA_site 
mis c f eature 
misc_f eature 
misc_f eature 
mis c_f eature 
CDS 



Em:AI301894 Em: AW08098 8 " 

/product="bA409K20 . 2 (dystonia 1, torsion (autosomal 

dominant; torsin A) (DQ2, T0R1A) ) " 

/evidence=not__experimentai 

complement (13997. .25180) 

/gene="DYTl" 

complement (14010 . . 14015) 

/gene="DYTl" 

14016. .14298 

/note="match: STS: Em:G30092" 
complement (14 429. .14885) 
/gene="DYTl" 

/note= M match: GSS: Em:B69651 M 
complement (144 69. .14876) 
/gene="DYTl" 

/note="match: GSS: Em:B48142" 
complement (14494 . .14860) 
/gene="DYTl" 

/note="match: GSS: Em: AQ566167" 
complement (14 632) 
/gene="DYTl" 
14650. .15099 

/note="match: STS: Em:G60041 Em:G60042" 
14807. .14914 

/note="match: STS: Em:G43378 Em:G43379 n 
14885. .15212 

/note="match: GSS: Em: AQ2134 91" 
14890. .15392 

/note="match: GSS: Em: AQ482600" 

complement (join (15025. .15275,19573. .19700,19798. .19973, 
23634. .23899,24961. .25138)) 



Query Match 100.0%; Score 20; DB 9; Length 169963; 

Best Local Similarity 100.0%; Pred. No. 9.2; 

Matches 20; Conservative 0; Mismatches 0; Indels 0; Gaps 



0; 



Qy 1 agtagagacgcgggtagatg 20 

I I I I I I I I I I I I I I I I I I M 9\P 3 J 

Db 25009 AGTAGAGACGCGGGTAGATG 25028 
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AC027008 166889 bp DNA linear HTG 08-NOV-2000 

Homo sapiens chromosome 8 clone RP11-212N14 map 8, WORKING DRAFT 
SEQUENCE, 26 unordered pieces. 
AC027008 

AC027 00 8 . 4 GI : 102808 9 8 

HTG; HTGS_PHASE1; HTGS_DRAFT. 

human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 166889) 

Birren,B., Linton, L., Nusbaum, C. and Lander, E. 
Homo sapiens chromosome 8, clone RP11-212N14 
Unpublished 

2 (bases 1 to 166889) 

Birren,B., Linton, L., Nusbaum, C , Lander, E., Abraham, H . , Allen, N., 
Anderson, S., Baldwin, J., Barna,N., Bastien,V. , Beda,F., 
Boguslavkiy, L. , Boukhgalter , B . , Brown, A. , Burkett, G. , 
Campopiano,A. , Castle, A. , Choepel,Y.,' Colangelo,M. , Collins, S., 
Collymore, A. , Cooke, P., DeArellano, K. , Dewar,K., Diaz, J. S., 
Dodge, S., Domino, M. , Doyle, M. , Ferreira,P., FitzHugh,W., Gage,D., 
Galagan,J., Gardyna,S., Ginde,S., Goyette,M., Graham, L. , 
Grand-Pierre, N. , Grant, G., Hagos,B., Heaford,A. , Horton,L., 
Howland, J. C. , Iliev, I . , Johnson, R. , Jones, C, Kann,L., Karatas,A., 
Klein, J., LaRocque,K., Lamazares , R. , Landers, T., Lehoczky,J., 
Levine,R., Lieu,C, Liu, G . , Locke, K. , Macdonald, P . , Marquis, N., 
McCarthy, M. , McEwan,P., McGurk,A. , McKernan,K., McPheeters , R. , 
Meldrim, J., Meneus,L., Mihova,T., Miranda, C, Mlenga,V., Morrow, J., 
Murphy, T., Naylor,J., Norman, C.H., O f Connor,T., 0 1 Donnell, P . , 
0'Neil,D., 01ivar,T.M., Oliver, J., Peterson, K. , Pierre, N., 
Pisani,C, Pollara,V. , Raymond, C, Riley, R. , Rogov,P., Rothman,D., 
Roy, A., Santos,R., Schauer,S., Severy,P., Spencer, B., 
Stange-Thomann,N. , Stojanovic,N. , Subramanian, A. , Talamas, J. , 
Tesfaye,S., Theodore, J., Tirrell,A., Travers,M., Trigilio,J., 
Vassiliev,H. , Viel,R., Vo,A., Wilson, B., Wu,X., Wyman,D., Ye,W.J., 
Young, G., Zainoun,J., Zimmer,A. and Zody,M. 
Direct Submission 

Submitted (25-MAR-2000 ) Whitehead Institute/MIT Center for Genome 

Research, 320 Charles Street, Cambridge, MA 02141, USA 

On Sep 23, 2000 this sequence version replaced gi: 8080819. 

All repeats were identified using RepeatMasker : 

Smit, A.F.A. & Green, P. (1996-1997) 

http : //ftp . genome . Washington . edu/RM/ RepeatMasker . html 
Genome Center 

Center: Whitehead Institute/ MIT Center for Genome Research 

Center code: WIBR 

Web site: http://www-seq.wi.mit.edu 

Contact : sequence_submissions@genome . wi .mit . edu 
Project Information 

Center project name: L8771 

Center clone name: 212_N_14 
Summary Statistics 

Sequencing vector: M13; M77815; 

Chemistry: Dye-terminator Big Dye; 100% of reads 
Assembly program: Phrap; version 0.960731 



100% of reads 



Consensus quality: 150653 bases at. least Q40 

Consensus quality: 158747 bases at least Q30 

Consensus quality: 161880 bases at least Q20 

Insert size: 186000; agarose-fp 

Insert size: 164389; sum-of-contigs 

Quality coverage: 3 . 6 in Q20 bases; agarose-fp 

Quality coverage: 4.0 in Q20 bases; sum-of-contigs 



* NOTE: This is a 'working draft 1 sequence. It currently 

* consists of 26 contigs. The true order of the pieces 

* is not known and their order in this sequence record is 

* arbitrary. Gaps between the contigs are represented as 

* runs of N, but the exact sizes of the gaps are unknown. 

* This record will be updated with the finished sequence 

* as soon as it is available and the accession number will 

* be preserved. 

* 1 140: contig of 140 bp in length 

* 141 240: gap of 100 bp 

* 241 1566: contig of 1326 bp in length 

* 1567 1666: gap of 100 bp 

* 1667 26279: contig of 24613 bp in length 

* 26280 26379: gap of 100 bp 

* 26380 27676: contig of 1297 bp in length 

* 27677 27776: gap of 100 bp 

* 27777 29820: contig of 2044 bp in length 

* 29821 29920: gap of 100 bp 

* 29921 33216: contig of 3296 bp in length 

* 33217 33316: gap of 100 bp 

* 33317 36627: contig of 3311 bp in length 

* 36628 36727: gap of 100 bp 

* 36728 39382: contig of 2655 bp in length 

* 39383 39482: gap of 100 bp 

* 39483 42417: contig of 2935 bp in length 

* 42418 42517: gap of 100 bp 

* 42518 46306: contig of 3789 bp in length 

* 46307 46406: gap of 100 bp 

* 46407 50207: contig of 3801 bp in length 

* 50208 50307: gap of 100 bp 

* 50308 53363: contig of 3056 bp in length 

* 53364 53463: gap of 100 bp 

* 53464 56760: contig of 3297 bp in length 

* 56761 56860: gap of 100 bp 

* 56861 61207: contig of 4347 bp in length 

* 61208 61307: gap of 100 bp 

* 61308 65984: contig of 4677 bp in length 

* 65985 66084: gap of 100 bp 

* 66085 72072: contig of 5988 bp in length 

* 72073 72172: gap of 100 bp 

* 72173 77741: contig of 5569 bp in length 

* 77742 77841: gap of 100 bp 

* 77842 85850: contig of 8009 bp in length 

* 85851 85950: gap of 100 bp 

* 85951 92902: contig of 6952 bp in length 

* 92903 93002: gap of 100 bp 

* 93003 103668: contig of 10666 bp in length 

* 103669 103768: gap of 100 bp 

* 103769 109322: contig of 5554 bp in length 



109323 
109423 
118527 
118627 
128875 
128975 
138017 
138117 
166501 
166601 



109422: gap of 



100 bp 



118526: contig of 9104 bp in length 



100 bp 

10248 bp in length 
100 bp 

9042 bp in length 
100 bp 

28384 bp in length 
100 bp 

bp in length. 



FEATURES 

source 



misc feature 



misc_f eature 
misc_f eature 
mis c_f eature 
mis c_f eature 
misc_f eature 
mis c_f eature 
misc_f eature 
misc_f eature 
misc_f eature 
misc_f eature 
misc_f eature 
misc_f eature 
misc_f eature 
misc_f eature 
misc_f eature 
misc_f eature 
misc_f eature 
misc feature 



118626: gap of 

128874: contig of 
128974: gap of 

138016: contig of 
138116: gap of 

166500: contig of 
166600: gap of 

166889: contig of 289 
Location/Qualifiers 
1. .166889 

/organism-"Homo sapiens" 
/db_xref="taxon: 9606" 
/ chromos ome=" 8 " 
/map="8" 

/clone="RPll-212N14" 
/clone_lib="RPCI-ll Human Male BAC" 
1. .140 

/note= " as sernbly_f ragrnent 
clone_end: SP6 
vector_side : left" 
241. .1566 

/note="assembly_f ragrnent" 
1667. .26279 

/note=" as sembly_f ragrnent" 

26380. .27676 

/note="assembly_f ragrnent " 

27777. .29820 

/not e="assembly_f ragrnent" 

29921. .33216 

/note=" as sembly_f ragrnent" 

33317. .36627 

/note="assembly__f ragrnent" 

36728. .39382 

/note="assembly_f ragrnent" 

39483. .42417 

/note="assembly_f ragrnent" 

42518. .46306 

/note=" as sembly_f ragrnent" 

46407. .50207 

/note= f, assembly_f ragrnent" 

50308. .53363 

/note="assembly_f ragrnent" 

53464. .56760 

/note="assembly_f ragrnent" 

56861. .61207 

/note= "as sembly_f ragrnent" 

61308. .65984 

/note= " as sembly_f ragrnent" 

66085. .72072 

/note= " as sembly_f ragrnent" 

72173. .77741 

/note= ,f assembly_f ragrnent" 

77842. .85850 

/note="assembly_f ragrnent" 

85951. .92902 



misc_f eature 
misc_f eature 
misc_f eature 
misc_f eature 
misc_f eature 
mis c_f eature 
misc feature 



BASE COUNT 437? 
ORIGIN 



/note="assembly_f ragment" 
93003. .103668 
/ note=" as sembly_f ragment" 
103769. .109322 
/note="assembly_f ragment" 
109423. .118526 
/note= " as sembly_f ragment" 
118627. .128874 
/note="assembly_f ragment" 
128975. .138016 
/note="assembly_f ragment" 
138117. .166500 
/note= " as sembly_f ragment" 
166601. .166889 
/note="assembly_f ragment 
clone_end:T7 
vector_side: right" 
2 a 38337 c 39037 g 43225 



2508 others 



Query Match 96.0%; Score 190; DB 2; Length 166889; 

Best Local Similarity 100.0%; Pred. No. 2.6e-49; 

Matches 190; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 ctcgactattatgacctggtggatggggtctcctaccagaaagccatgttcatatttctc 60 

I I I I I I I I I I I I I I I I I I II II I I I I I I I I I M I I I I I I I I I M M M I I I I I I I I I I II 
Db 104685 CT C GACT AT TAT GAC CT GGT G GAT G G GGT CT CCTAC CAGAAAGC CAT GT T CAT AT T T CT C 104744 

Qy 61 aggtaaggtcagggctaggacatgatggatgggccccgagcccaagcctctgagctccag 120 

I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I M II I I I I I I I I I I I I I M I II I I I I I I 
Db 104745 AGGTAAGGTCAGGGCTAGGACATGATGGATGGGCCCCGAGCCCAAGCCTCTGAGCTCCAG 104 8 04 

Qy 121 gagaaaaccctgtccttacccactgggattgttttgcagcaatgctggagcagaaaggat 180 

I | | I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I M I I I II I I I I I I I I I I I I I I 

Db 104805 GAGAAAACC CT GT C CT T AC C CACT GGGAT T GT T T T GCAGCAAT G C T G GAGCAGAAAGGAT 104864 

Qy 181 cacagatgtg 190 

II I M I I I I I 

Db 104865 CACAGATGTG 104874 



RESULT 2 
AL158207/c 
LOCUS 

DEFINITION 



ACCESSION 

VERSION 

KEYWORDS 



AL158207 169963 bp DNA linear PRI 25-SEP-2001 

Human DNA sequence from clone RP11-409K20 on chromosome 9 Contains 
the TOR1B gene for torsin family 1 member B (torsin B) (DQ1) , the 
DYT1 gene for f dystonia 1, torsion 1 (autosomal dominant; torsin A) 
(DQ2, TOR1A) , the gene for hepatocellular carcinoma-associated 
antigen 59 (HSPC220, LOC51759), the USP20 gene for ubiquitin 
specific protease 20 (KIAA1003) , and the gene for f ormin-binding 
protein 17 (FBP17 , includes KIAA0554, FLJ13619, FLJ10754 and 
FLJ10113) . Contains ESTs, STSs, GSSs and four CpG islands, complete 
sequence . 
AL158207 

AL158207.15 GI : 1271794 9 

HTG; CpG island; DQ1; dystonia; DYT1; FBP17; FLJ10113; FLJ10754; 



formin-binding; hepatocellular; HSPC220; KIAA0554; 
LOC51759; protease; T0R1B; torsin; ubiquitin; USP20 . 



SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



Craniata; Vertebrata; Euteleostomi ; 
Catarrhini; Hominidae; Homo. 



FEATURES 

source 



misc feature 



FLJ13619; 
KIAA1003; 
human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Primates; 
1 (bases 1 to 169963) 
Babbage, A. 
Direct Submission 

Submitted (25-SEP-2001 ) Sanger Centre, Hinxton, Cambridgeshire, 
CB10 ISA, UK. E-mail enquiries: humquery@sanger.ac.uk Clone 
requests : clonerequest@sanger. ac.uk 

On Feb 8, 2001 this sequence version replaced gi: 12657099. 
During sequence assembly data is compared from overlapping clones. 
Where differences are found these are annotated as variations 
together with a note of the overlapping clone name. Note that the 
variation annotation may not be found in the sequence submission 
corresponding to the overlapping clone, as we submit sequences with 
only a small overlap as described above. 

This sequence was finished as follows unless otherwise noted: all 
regions were either double-stranded or sequenced with an alternate 
chemistry or covered by high quality data (i.e., phred quality >= 
30) ; an attempt was made to resolve all sequencing problems, such 
as compressions and repeats; all regions were covered by at least 
one plasmid subclone or more than one M13 subclone; and the 
assembly was confirmed by restriction digest. The following 
abbreviations are used to associate primary accession numbers given 
in the feature table with their source databases: Em:, EMBL; Sw: , 
SWISSPROT; Tr:, TREMBL; Wp : , WORMPEP; Information on the WORMPEP 
database can be found at 

http : //www. sanger . ac.uk/Projects/C_elegans/wormpep This sequence 
was generated from part of bacterial clone contigs of human 
chromosome 9, constructed by the Sanger Centre Chromosome 9 Mapping 
Group. Further information can be found at 
http : //www. sanger . ac . uk/HGP/Chr9 

RP11-409K20 is from the library RPCI-11.2 constructed by the group 
of Pieter de Jong. For further details see 
http : //www. chori . org/bacpac/home . htm 
VECTOR: pBACe3 . 6 

This sequence is the entire insert 
left end of clone RP11-138E2 is at 

Location/Qualifiers 

1. .169963 

/organism="Homo sapiens" 
/db_xref="taxon: 9606" 
/ chr omos ome= " 9 " 
/clone="RPll-409K20" 
/clone_lib="RPCI-ll . 2 " 
5. .86 

/note="MSTC repeat: matches 
28. .462 

/note="match: GSS: Em: AQ718881" 
817. .992 

/note="Charlie2 repeat: matches 
complement (2510. .2941) 
/note="match: GSS: Em: AQ041615" 
2944. .3096 



of clone RP11-409K20 The true 
118932 in this sequence. 



repeat_region 



misc feature 



repeat region 



misc feature 



46. 



126 of consensus" 



,195 of consensus" 



misc feature 



mRNA 



gene 
CDS 



mRNA 



CDS 



mRNA 



/note="match: GSS: Em:B74700 H 
3329. .4807 
/note="CpG island" 
/evidence=not_experimental 

join(4205. .4464,5126. .5391,8241. .8416,9958. .10085, 

10395. .12334) 

/gene="TORlB" 

/note="match: cDNAs : Em:AF007872 Em:AJ297743 



: Em:AI815528 Em:AW160403 Em:AW972065 
Em:AA112625 Em:BE740991 Em:BE563034 
Em:AV728123 Em:AW952051 Em:AW148938 
Em:AW016676 Em:AI223067 Em:BE108689 
Em:AW173267 Em:AI185247" 

.1.1 (torsin family 1, member B 



( torsin 



.5391,8241. .8416,9958. .10085, 



Tr:014657" 



match: ESTs ; 
Em:BF313148 
Em:BE893335 
Em:BE502754 
Em:AI808893 
/product="bA409K20. 
B) (DQ1) ) " 

/evidence=not_experimental 
4205. .12334 
/gene= n TORlB M 
join(4266. .4464,5126. 
10395. .10636) 
/gene="TORlB" 
/note= ,f match : proteins : 
/ codon_start=l 
/evidence=not_experimental 

/product="bA409K20. 1.1 (torsin family 1, member B (torsin 
B) (DQ1) ) " 

/protein_id="CAC88.165. 1" 
/db_xref="GI: 15787707" 

/translation="MLRAGWLRGAAALALLLAARWAAFEPITVGLAIGAASAITGYL 
SYNDIYCRFAECCREERPLNASALKLDLEEKLFGQHLATEVIFKALTGFRNNKNPKKP 
LTLSLHGWAGTGKNFVSQIVAENLHPKGLKSNFVHLFVSTLHFPHEQKIKLYQDQLQK 
WI RGNVSACANS VFI FDEMDKLHPGI I DAI KPFLDYYEQVDGVS YRKAI FI FLSNAGG 
DLITKTALDFWRAGRKREDIQLKDLEPVLSVGVFNNKHSGLWHSGLIDKNLIDYFIPF 
LPLEYRHVKMCVRAEMRARGSAIDEDIVTRVAEEMTFFPRDEKIYSDKGCKTVQSRLD 
FH" 

join(4321. .4464,5126. .5391,11280. .11571) 

/gene="TORlB" 

/note="isof orm 3 

match: ESTs: Em:BF058863 Em:BE315222" 

/product="bA409K20. 1.3 (torsin family 1, member B (torsin 
B) (DQ1), putative isoform 3)" 
/evidence=not_experimental 

join(<4321. .4464,5126. .5391,11280. .11294) 

/gene="TORlB" 

/codon_start=3 

/evidence=not_experimental 

/product="bA409K20.1.3 (torsin family 1, member B (torsin 
B) (DQ1), putative isoform 3)" 
/protein_id="CAC88166. 1" 
/db_xref="GI : 15787708" 

/ trans lation="RVVAAFEPITVGLAIGAASAITGYLSYNDIYCRFAECCREERPL 
NASALKLDLEEKLFGQHLATEVIFKALTGFRNNKNPKKPLTLSLHGWAGTGKNFVSQI 
VAENLHPKGLKSNFVHLFVSTLHFPHEQKIKLYQSSLT" 

join(5126. .5266,8241. .8416,9958. .10085,10395. .10456) 
/gene="TORlB n 

/note="match: ESTs: Em: AI468027 " 
/product="bA409K20. 1.2 (isoform 2)" 



mRNA 



CDS 



mi sc_f eature 

repeat_region 

mi sc_f eature 

misc_f eature 

polyA_signal 

polyA_site 

polyA_site 

mRNA 



gene 

polyA_signal 
misc_f eature 
misc_f eature 

misc feature 



/evidence=not_experimental 

join(5159. .5391,8241. .8416,11280. .11319) 

/gene="TORlB" 

/note="isoform 4 

match: ESTs : Em:Al568476 M 

/product="bA409K20. 1. 4 (torsin family 1, member B (torsin 
B) (DQ1), putative isoform 4)" 
/evidence=not_experimental 

join(<5159. .5391,8241. .8416,11280. .11289) 

/gene="TORlB" 

/ codon_s tart=3 

/evidence=not_experimental 

/product="bA409K2 0.1.4 (torsin family 1, member B (torsin 
B) (DQ1), putative isoform 4)" 
/protein_id="CAC88167 . 1" 
/db_xref= n GI: 15787709" 

/trans la tion="QHLATEVI FKALTGFRNNKNPKKPLTLSLHGWAGTGKN FVSQI V 

AENLHPKGLKSNFVHLFVSTLHFPHEQKIKLYQDQLQKWIRGNVSACANSVFIFDEMD 

KLHPGI I DAI KP FLDYYEQVDGVS YRKAI FI FLRVH " 

5188. .5526 

/gene= n TORlB" 

/note="match: STS : Em:G24606" 
7370. .7432 

/note="MER61E repeat: matches 128. .190 of consensus" 

complement (11923. . 12334) 

/note="match: STS: Em:G27406" 

complement (12097. . 12334) 

/note="match: STS: Em:G24725" 

12313. .12318 

/gene= ,, TORlB" 

12334 

/gene="TORlB" 
complement (13997) 
/gene="DYTl M 

complement (join (13997. .15275,19573. .19700,197 98. .19973, 

23634. .23899,24961. .25180)) 

/gene="DYTl" 

/note="match: cDNAs : Em:AF007871 

match: ESTs: Em:BE272533 Em:BE314317 Em:BE784377 
Em:BF203163 Em:BE622540 Em:AI039978 Em:AI770117 
Em:AW050630 Em:AI970719 Em:BE463967 Em:AI374678 
Em:AI167967 Em:AI127274 Em:AI699731 Em:AW001722 
Em:AI301894 Em:AW080988" 

/product="bA409K20. 2 (dystonia 1, torsion (autosomal 
dominant; torsin A) (DQ2, TOR1A) ) " 
/evidence=not_experimental 
complement (139 97 . .25180) 
/gene="DYTl" 
complement ( 14010 . 
/gene="DYTl" 
14016. .14298 
/note="match: STS: 
complement (14429 . 
/gene="DYTl" 
/note="match: GSS: 
complement ( 144 69 . 
/gene= n DYTl" 



.14015) 



Em:G30092" 
.14885) 

Em:B69651" 
.14876) 



/note="match: GSS: Em:B48142" 
misc_f eature complement (14494 . .14860) 
/gene="DYTl" 

/note="match: GSS: Em: AQ566167 " 
polyA site complement ( 14632 ) 

/gene="DYTl" 
misc_f eature 14650. .15099 

/note="match: STS : Em:G60041 Em:G60042 ,! 
misc_f eature 14807. .14914 

/note="match: STS: Em:G43378 Em:G43379" 
misc_f eature 14885. .15212 

/note="match: GSS: Em: AQ213491" 
misc_feature 14890. .15392 

/note="match: GSS: Em:AQ482600" 
CDS complement (join (15025. .15275,19573. .197 00,197 98. .19973, 

23634. .23899,24961. .25138)) 

Query Match 96.0%; Score 190; DB 9; Length 169963; 

Best Local Similarity 100.0%; Pred. No. 2.6e-49; 

Matches 190; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 



M I II M II M I I I I I I I I I I I I I I M II I I I I I I M I I I I I I I I I I I I I I M II I I I I I 



I M M II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I M I I 



I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 



Qy 


1 


Db 


19859 


Qy 


61 


Db 


19799 


Qy 


121 


Db 


19739 


Qy 


181 


Db 


19679 



1 1 1 1 1 II 1 1 



RESULT 1 

AC027008 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 



TITLE 
JOURNAL 

COMMENT 



AC027008 166889 bp DNA linear HTG 08-NOV-2000 

Homo sapiens chromosome 8 clone RP11-212N14 map 8, WORKING DRAFT 
SEQUENCE, 26 unordered pieces. 
AC027008 

AC027008. 4 GI: 102808 98 

HTG; HTGS_PHASE1; HTGS_DRAFT . 

human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 166889) 

Birren,B., Linton, L. , Nusbaum,C. and Lander, E. 
Homo sapiens chromosome 8, clone RP11-212N14 
Unpublished 

2 (bases 1 to 166889) 

Birren,B., Linton, L., Nusbaum,C, Lander, E., Abraham,H., Allen, N., 
Anderson, S., Baldwin, J., Barna,N., Bastien,V., Beda,F., 
Boguslavkiy, L. , Boukhgalter , B . , Brown, A. , Burkett, G. , 
Campopiano,A. , Castle, A. , Choepel,Y., Colangelo,M. , Collins, S., 
Collymore, A. , Cooke, P., DeArellano, K . , Dewar,K., Diaz, J. S., 
Dodge, S., Domino, M. , Doyle,M., Ferreira,P., FitzHugh,W. , Gage,D., 
Galagan,J., Gardyna,S., Ginde,S., Goyette,M. , Graham, L., 
Grand-Pierre, N. , Grant, G., Hagos,B., Heaford,A. , Horton,L., 
Howland, J.C. , Iliev, I . , Johnson, R. , Jones, C, Kann,L., Karatas,A., 
Klein, J., LaRocque,K., Lamazares , R. , Landers, T., Lehoczky, J. , 
Levine,R., Lieu,C, Liu, G . , Locke, K. , Macdonald, P . , Marquis, N., 
McCarthy, M. , McEwan,P., McGurk, A. , McKernan,K., McPheeters , R. , 
Meldrim,J., Meneus,L., Mihova,T., Miranda, C, Mlenga,V., Morrow, J. , 
Murphy, T., Naylor,J., Norman, C.H., 0' Connor, T., O ' Donnell , P . , 
0'Neil,D., Olivar,T.M., Oliver, J., Peterson, K. , Pierre, N., 
Pisani,C, Pollara,V., Raymond, C, Riley, R. , Rogov,P., Rothman,D., 
Roy, A., Santos, R. , Schauer,S., Severy,P., Spencer, B., 
Stange-Thomann,N. , Stojanovic,N. , Subramanian, A. , Talamas, J . , 
Tesfaye,S., Theodore, J., Tirrell,A., Travers,M., Trigilio,J., 
Vassiliev,H. , Viel,R., Vo,A., Wilson, B., Wu,X., Wyman,D., Ye,W.J., 
Young, G., Zainoun,J., Zimmer,A. and Zody,M. 
Direct Submission 

Submitted (25-MAR-2000) Whitehead Institute/MIT Center for Genome 

Research, 320 Charles Street, Cambridge, MA 02141, USA 

On Sep 23, 2000 this sequence version replaced gi:8080819. 

All repeats were identified using RepeatMasker : 

Smit, A.F.A. & Green, P. (1996-1997) 

http : //ftp . genome . Washington . edu/RM/RepeatMasker . html 
Genome Center 

Center: Whitehead Institute/ MIT Center for Genome Research 

Center code: WIBR 

Web site: http://www-seq.wi.mit.edu 

Contact : sequence_submissions@genome . wi .mit . edu 
Project Information 

Center project name: L8771 

Center clone name: 212_N_14 
Summary Statistics 

Sequencing vector: M13; M77815; 100% of reads 

Chemistry: Dye-terminator Big Dye; 100% of reads 

Assembly program: Phrap; version 0.960731 




Consensus quality: 150653 bases at least Q40 

Consensus quality: 158747 bases at least Q30 

Consensus quality: 161880 bases at least Q20 

Insert size: 186000; agarose-fp 

Insert size: 164389; sum-of-contigs 

Quality coverage: 3 . 6 in Q20 bases; agarose-fp 

Quality coverage: 4 . 0 in Q20 bases; sum-of-contigs 



* NOTE: This is a 'working draft 1 sequence. It currently 

* consists of 26 contigs . The true order of the pieces 

* is not known and their order in this sequence record is 

* arbitrary. Gaps between the contigs are represented as 

* runs of N, but the exact sizes of the gaps are unknown. 

* This record will be updated with the finished sequence 

* as soon as it is available and the accession number will 



* 


be preserved. 










1 


140: contig 


of 140 bp in length 


* 


141 


240: gap of 


100 bp 






* 


241 


1566: contig 


of 1326 bp 


in 


length 




1567 


1666: gap of 


100 bp 








1667 


26279 : contig 


of 24613 bp 


' in length 


* 


26280 


26379: gap of 


100 bp 






* 


26380 


27676: contig 


of 1297 bp 


in 


length 


* 


27677 


27776: gap of 


100 bp 






+ 


27777 


29820: contig 


of 2044 bp 


in 


length 


* 


29821 


29920: gap of 


100 bp 






* 


29921 


33216: contig 


of 3296 bp 


in 


length 


* 


33217 


33316: gap of 


100 bp 






-k 


33317 


36627 : contig 


of 3311 bp 


in 


length 


* 


36628 


36727: gap of 


100 bp 






* 


36728 


39382: contig 


of 2655 bp 


in 


length 


* 


39383 


39482: gap of 


100 bp 








39483 


42417: contig 


of 2935 bp 


in 


length 


* 


42418 


42517: gap of 


100 bp 






* 


42518 


46306: contig 


of 3789 bp 


in 


length 




46307 


46406: gap of 


100 bp 






* 


46407 


50207: contig 


of 3801 bp 


in 


length 


* 


50208 


50307: gap of 


100 bp 






* 


50308 


53363: contig 


of 3056 bp 


in 


length 


* 


53364 


53463: gap of 


100 bp 








53464 


56760: contig 


of 3297 bp 


in 


length 


* 


56761 


56860: gap of 


100 bp 






* 


56861 


61207: contig 


of 4347 bp 


in 


length 


★ 


61208 


61307: gap of 


100 bp 






* 


61308 


65984: contig 


of 4677 bp 


in 


length 




65985 


66084: gap of 


100 bp 








66085 


72072: contig 


of 5988 bp 


in 


length 


* 


72073 


72172: gap of 


100 bp 








72173 


77741: contig 


of 5569 bp 


in 


length 


* 


77742 


77841: gap of 


100 bp 






* 


77842 


85850: contig 


of 8009 bp 


in 


length 


* 


85851 


85950: gap of 


100 bp 








85951 


92902: contig 


of 6952 bp 


in 


length 




92903 


93002: gap of 


100 bp 








93003 


103668: contig 


of 10666 bp in length 


* 


103669 


103768: gap of 


100 bp 






* 


103769 


109322: contig 


of 5554 bp 


in 


length 



FEATURES 

source 



misc feature 



misc feature 



misc feature 



misc feature 



misc feature 



misc feature 



misc feature 



misc feature 



misc feature 



misc feature 



misc feature 



misc feature 



misc feature 



misc feature 



misc feature 



misc feature 



misc feature 



misc feature 



misc feature 



109323 109422: gap of 100 bp 

109423 118526: contig of 9104 bp in length 

118527 118626: gap of 100 bp 

118627 128874: contig of 10248 bp in length 

128875 128974: gap of 100 bp 

128975 138016: contig of 9042 bp in length 

138017 138116: gap of 100 bp 

138117 166500: contig of 28384 bp in length 

166501 166600: gap of 100 bp 

166601 166889: contig of 289 bp in length. 

Location/Qualifiers 

1. .166889 

/organism="Homo sapiens " 
/db_xref="taxon: 9606" 
/ ch r omo s ome= " 8 " 
/map-" 8" 

/clone="RPll-212N14" 
/clone_lib="RPCI-ll Human Male BAC" 
1. .140 

/note="assembly_f ragment 
clone_end: SP6 
vector_side : left" 
241. .1566 

/note="assembly_f ragment" 
1667. .26279 

/note="assembly_f ragment" 
26380. .27676 
/note= " as sembly_f ragment" 
27777. .29820 
/note=" as sembly_f ragment" 
29921. .33216 
/note="assembly_f ragment" 
33317. .36627 
/note=" as sembly_f ragment" 
36728. .39382 
/note="assembly_f ragment" 
39483. .42417 
/note="assembly_f ragment" 
42518. .46306 
/note=" as sembly_f ragment " 
46407. .50207 
/note- " as sembly_f ragment" 
50308. .53363 
/no te="assembly_f ragment" 
53464. .56760 
/note="assembly_f ragment 11 
56861. .61207 
/ note= "as sembly_f ragment" 
61308. .65984 
/ no te= "as sembly_f ragment" 
66085. .72072 
/note="assembly_f ragment" 
72173. .77741 
/ no te= "as sembly_f ragment" 
77842. .85850 
/ note- "as sembly_f ragment" 
85951. .92902 







/note= 


"assembly 


fragment' 


misc 


feature 


93003. 


.103668 








/ note- 


"assembly 


fragment 1 


mis c 


feature 


103769 


. .109322 








/ note= 


"assembly 


fragment 1 


mi s c 


feature 


109423 


. .118526 








/ note= 


"assembly 


fragment 


misc 


feature 


118627 


. .128874 








/ note= 


"assembly 


fragment 


mis c 


feature 


128975 


. .138016 








/ note= 


"assembly 


fragment 


mi sc_ 


feature 


138117 


. .166500 








/ note= 


"assembly 


fragment 


misc 


feature 


166601 


. .166889" 








/note= 


"assembly_ 


fragment 






clone 


end:T7 








vector 


side: right" 



BASE COUNT 43782 a 38337 c 39037 g 43225 t 2508 others 
ORIGIN 

Query Match 99.3%; Score 415; DB 2; Length 166889; 

Best Local Similarity 99.3%; Pred. No. 4.1e-lll; 

Matches 415; Conservative 0; Mismatches 3; Indels 0; Gaps 0; 

Qy 1 tttggagtgagacaggactgggttcaggtcccagctctgccacatatagtcttgggcaag 60 

I I I I I I I I I I I I I I I II M I I I I I I I I I II I I I I I I I M II I I I I I I I I I I I I II I M I I 
Db 104287 TTTGGAGTGAGACAGGACTGGGTTCAGGTCCCAGCTCTGCCACATATAGTCTTGGGCAAG 104346 

Qy 61 tggagtaagcgctctctgtgcctcagttccctcatctgtaaaatgagaacgatagtgccc 120 

I | | | M I I I M M I I I I M I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I II I 
Db 104347 TGGAGTAAGCGCTCTCTGTGCCTCAGTTCCCTCATCTGTAAAATGAGAACGATAGTGCCC 104406 

Qy 121 actccatggggttggtaggaacaaagaagattttgggcatgtaaagttcttagtgccgag 180 

I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I 
Db 104 407 ACTCCATGGGGTTGGTAGGAACAAAGAAGATTTTGGGCATGTAAAGTTCTTAGTGCCGAG 104466 

Qy 181 tgcacagtggtctgtaagtgaagctgcggttcttagtggtagaaggagctgattgatggc 240 

M I I I I I I I II I I I I I I I I I II I I I I I I I II II I I I I I I II I M I I I I I I I I I I I I I II I 
Db 104467 TGCACAGTGGTCTGTAAGTGAAGCTGCGGTTCTTAGTGGTAGAAGGAGCTGATTGATGGC 104526 

Qy 241 cctggctgagaactttgtgttcgctttttcccnttttaattcaggatcagttacagttgt 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 104527 CCTGGCTGAGAACTTTGTGTTCGCTTTTTCCCCTTTTAATTCAGGATCAGTTACAGTTGT 104586 

Qy 301 ggattcgaggcaacgtgagtgcctgtgcgaggtccatcttcatatttgatgaaatggata 360 

I | | I I I I I II I II I II I I I I I I I I I II I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I 
Db 104587 GGAT T C GAGGCAAC GT GAGT GC CT GT G C GAGGT C CAT CT T CAT AT T T GAT GAAAT GGAT A 104646 

Qy 361 agatgcatgcaggcctcatagatgccntcaancctttcctcgactattatgacctggt 418 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I M 
Db 104647 AGAT G CAT GCAGG C CT CAT AGAT GC CAT CAAG CCT T T C CT C GACTAT TAT GACCT GGT 104704 



RESULT 2 
AL158207/C 
LOCUS AL158207 



169963 bp DNA linear PRI 25-SEP-2001 



ACCESSION 

VERSION 

KEYWORDS 



SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



Crania t a ; Vertebrata ; Euteleostomi ; 
Catarrhini; Hominidae; Homo. 



DEFINITION Human DNA sequence from clone RP11-409K20 on chromosome 9 Contains 
the T0R1B gene for torsin family 1 member B (torsin B) (DQ1) , the 
DYT1 gene for 'dystonia 1, torsion' (autosomal dominant; torsin A) 
(DQ2, TOR1A) , the gene for hepatocellular carcinoma-associated 
antigen 59 (HSPC220, LOC51759) , the USP20 gene for ubiquitin 
specific protease 20 (KIAA1003) , and the gene for f ormin-binding 
protein 17 (FBP17, includes KIAA0554, FLJ13619, FLJ10754 and 
FLJ10113) . Contains ESTs, STSs, GSSs and four CpG islands, complete 
sequence . 
AL158207 

AL1582 07 . 15 GI; 12717 94 9 

HTG; CpG island; DQ1 ; dystonia; DYT1; FBP17; FLJ10113; FLJ10754; 
FLJ13619; f ormin-binding; hepatocellular; HSPC220; KIAA0554; 
KIAA1003; LOC51759; protease; TOR1B; torsin; ubiquitin; USP20. 
human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Primates; 
1 (bases 1 to 169963) 
Babbage,A. 
Direct Submission 

Submitted ( 25-SEP-2001 ) Sanger Centre, Hinxton, Cambridgeshire, 
CB10 ISA, UK. E-mail enquiries: humquery@sanger.ac.uk Clone 
requests : clonerequest @s anger . ac . uk 

On Feb 8, 2001 this sequence version replaced gi : 12657099. 
During sequence assembly data is compared from overlapping clones. 
Where differences are found these are annotated as variations 
together with a note of the overlapping clone name. Note that the 
variation annotation may not be found in the sequence submission 
corresponding to the overlapping clone, as we submit sequences with 
only a small overlap as described above. 

This sequence was finished as follows unless otherwise noted: all 
regions were either double-stranded or sequenced with an alternate 
chemistry or covered by high quality data (i.e., phred quality >= 
30) ; an attempt was made to resolve all sequencing problems, such 
as compressions and repeats; all regions were covered by at least 
one plasmid subclone or more than one M13 subclone; and the 
assembly was confirmed by restriction digest. The following 
abbreviations are used to associate primary accession numbers given 
in the feature table with their source databases: Em:, EMBL; Sw: , 
SWISSPROT; Tr:, TREMBL; Wp : , WORMPEP; Information on the WORMPEP 
database can be found at 

http : //www. Sanger . ac. uk/ Project s/C_elegans/ wo rmpep This sequence 
was generated from part of bacterial clone contigs of human 
chromosome 9, constructed by the Sanger Centre Chromosome 9 Mapping 
Group. Further information can be found at 
http : //www. Sanger . ac. uk/HGP/Chr 9 

RP11-409K20 is from the library RPCI-11.2 constructed by the group 
of Pieter de Jong. For further details see 
http : //www. chori . org/bacpac/home . htm 
VECTOR: pBACe3 . 6 

This sequence is the entire insert of clone RP11-409K20 The true 
left end of clone RP11-138E2 is at 118932 in this sequence. 
FEATURES Location/ Qualifiers 

source 1. .169963 

/organism="Homo sapiens' 1 
/db xref="taxon:9606" 



repeat_region 
misc_f eature 
repeat_region 
mi sc_f eature 
mi sc_f eature 
misc_f eature 

mRNA 



gene 
CDS 



mRNA 



,126 of consensus' 



,195 of consensus 1 



.8416,9958. .10085, 



B (torsin 



CDS 



/ chr omo s ome= " 9 " 
/clone="RPll-409K20" 
/clone_lib="RPCI-ll . 2" 
5. .86 

/note="MSTC repeat: matches 46. 
28. .462 

/note="match: GSS: Em: AQ718881" 
817. .992 

/note="Charlie2 repeat: matches 7. 
complement (2510. .2941) 
/note="match: GSS : Em: AQ041615" 
2944. .3096 

/note="match: GSS: Em:B74700" 
3329. .4807 
/note="CpG island" 
/evidence=not_experimental 
join(4205. .4464,5126. .5391,8241. 
10395. .12334) 
/gene="TORlB" 

/note="match: cDNAs : Em:AF007872 Em:AJ297743 
match: ESTs : Em:AI815528 Em:AW160403 Em:AW972065 
Em:BF313148 Em:AA112625 Em:BE740991 Em:BE563034 
Em:BE893335 Em:AV728123 Em:AW952051 Em:AW148938 
Em:BE502754 Em:AW016676 Em:AI223067 Em:BE108689 
Em:AI808893 Em:AW173267 Em: AI185247 " 
/product="bA409K20.1.1 (torsin family 1, member 
B) (DQ1) ) " 

/evidence=not_experimental 
4205. .12334 
/gene="TORlB" 
join(4266. .4464,5126. 
10395. .10636) 
/gene="TORlB" 
/note="match : proteins 
/ codon_start=l 
/evidence=not_experimental 

/product="bA409K20. 1 . 1 (torsin family 1, member B (torsin 
B) (DQ1) ) " 

/protein_id="CAC88165 . 1" 
/db_xref="GI: 15787707" 

/ trans 1 at i on= "MLRAGWLRGAAALALLLAARWAAFEP ITVGLAI GAASAITGYL 
SYNDIYCRFAECCREERPLNASALKLDLEEKLFGQHLATEVI FKALTGFRNNKNPKKP 
LTLSLHGWAGTGKNFVSQIVAENLHPKGLKSNFVHLFVSTLHFPHEQKIKLYQDQLQK 
WIRGNVSACANSVFIFDEMDKLHPGIIDAIKPFLDYYEQVDGVSYRKAIFIFLSNAGG 
DLITKTALDFWRAGRKREDIQLKDLEPVLSVGVFNNKHSGLWHSGLIDKNLIDYFIPF 
L P L E YRH VKMC VRAEMRARG S AI D E D I VT RVAE EMT FFPRDEKIYSD KGC KT VQ S RL D 
FH" 

join(4321. .4464,5126. .5391,11280. .11571) 

/gene="TORlB" 

/note="isof orm 3 

match: ESTs: Em:BF058863 Em:BE315222" 

/product="bA409K20.1.3 (torsin family 1, member B (torsin 
B) (DQ1), putative isoform 3)" 
/evidence=not_experimental 

join(<4321. .4464,5126. .5391,11280. .11294) 

/gene="TORlB" 

/codon start=3 



5391,8241. .8416,9958. .10085, 



Tr:014657" 




/ evidence=not_experimental 

/product="bA409K20 . 1 . 3 (torsin family 1, member B (torsin 
B) (DQ1), putative isoform 3)" 
/protein_id="CAC88166. 1" 
/db_xref="GI: 15787708" 

/ trans la tion="RVVAAFEPITVGIA.IGAASAITGYLSYNDIYCRFAECCREERPL 
NASALKLDLEEKLFGQHLATEVIFKALTGFRNNKNPKKPLTLSLHGWAGTGKNFVSQI 
VAENLHPKGLKSNFVHLFVSTLHFPHEQKIKLYQSSLT" 
mRNA join{5126. .5266,8241. .8416,9958. .10085,10395. .10456) 

/gene="TORlB" 

/note="match: ESTs : Em: AI4 68027 " 
/product="bA409K20. 1.2 (isoform 2)" 
/ evidence=not_experimental 
mRNA join(5159. .5391,8241. .8416,11280. .11319) 

/gene= n TORlB" 
/note="isof orm 4 
match: ESTs: Em: AI568476" 

/product="bA409K20 . 1 . 4 (torsin family 1, member B (torsin 
B) (DQ1), putative isoform 4)" 
/evidence=not_experimental 
CDS join(<5159. .5391,8241. .8416,11280. .11289) 

/gene="TORlB M 
/ codon_start=3 
/evidence=not_experimental 

/product="bA409K20 . 1 . 4 (torsin family 1, member B (torsin 
B) (DQ1), putative isoform 4)" 
/protein_id="CAC88167. 1" 
/db_xref="GI : 15787709" 

/ trans lation="QHLATEVIFKALTGFRNNKNPKKPLTLSLHGWAGTGKNFVSQIV 
AENLHPKGLKSNFVHLFVSTLHFPHEQKIKLYQDQLQKWIRGNVSACANSVFIFDEMD 
KLHPGI I DAI KP FLDYYEQVDGVS YRKAI FI FLRVH " 
misc_feature 5188. .5526 

/gene= n TORlB" 

/note="match: STS : Em:G24606" 
repeat_region 7370. .7432 

/ note="MER61E repeat: matches 128. .190 of consensus" 
misc_feature complement ( 11923 . .12334) 

/note="match: STS: Em:G27406" 
misc_feature complement ( 12097 . .12334) 

/note="match: STS: Em:G24725" 
polyA_signal 12313. .12318 

/gene="TORlB" 
polyA_site 12334 

/gene="TORlB" 
polyA_site complement ( 13997 ) 

/gene="DYTl" 

mRNA complement (join (13997. .15275,19573. .19700,19798. .19973, 

23634. .23899,24961. .25180)) 
/gene="DYTl" 

/note="match: cDNAs : Em:AF007871 

match: ESTs: Em:BE272533 Em:BE314317 Em:BE784377 
Em:BF203163 Em:BE622540 Em:AI039978 Em:AI770117 
Em:AW050630 Em:AI970719 Em:BE463967 Em:AI374678 
Em:AI167967 Em:AI127274 Em:AI699731 Em:AW001722 
Em:AI301894 Em:AW080988" 

/product="bA409K20 . 2 (dystonia 1, torsion (autosomal 
dominant; torsin A) (DQ2, TOR1A) ) " 



gene 

polyA_signal 
misc_f eature 
misc_f eature 

mi sc_f eature 

mis cofeature 

polyA_site 
misc_f eature 
mis c_f eature 
misc_f eature 
misc_f eature 
CDS 



Em:B69651" 
, 14876) 

Em:B48142" 
,14860) 



/evidence=not_experimental 
complement (13997. .25180) 
/gene="DYTl n 

complement (14010. .14015) 
/gene="DYTl" 
14016. .14298 

/note="match: STS: Em:G30092' 
complement (14429. .14885) 
/gene="DYTl" 
/note="match: GSS : 
complement ( 14469 . 
/gene="DYTl" 
/note= fl match : GSS : 
complement ( 144 94 . 
/gene="DYTl" 

/note="match: GSS: Em:AQ566167" 
complement (14 632) 
/gene= n DYTl M 
14650. .15099 
/note="match: STS: 
14807. .14914 
/note= n match : STS : 
14885. .15212 
/no te= "match: GSS : 
14890. .15392 

/note="match: GSS: Em: AQ482600" 

complement (join (15025. . 15275, 19573. . 19700, 197 98. . 19973, 
23634. .23899,24961. .25138)) 



Em:G60041 Em:G60042" 



Em:G43378 Em:G43379" 



Em:AQ213491 n 



Query Match 99.3%; Score 415; DB 9; Length 169963; 

Best Local Similarity 99.3%; Pred. No. 4.1e-lll; 

Matches 415; Conservative 0; Mismatches 3; Indels 0; 



Gaps 



0; 



Qy 1 tttggagtgagacaggactgggttcaggtcccagctctgccacatatagtcttgggcaag 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I M I I I I I I I I I I 

Db 20257 TTTGGAGTGAGACAGGACTGGGTTCAGGTCCCAGCTCTGCCACATATAGTCTTGGGCAAG 20198 

Qy 61 tggagtaagcgctctctgtgcctcagttccctcatctgtaaaatgagaacgatagtgccc 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | | 

Db 20197 TGGAGTAAGCGCTCTCTGTGCCTCAGTTCCCTCATCTGTAAAATGAGAACGATAGTGCCC 20138 

Qy 121 actccatggggttggtaggaacaaagaagattttgggcatgtaaagttcttagtgccgag 180 

I I I I M I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I 

Db 2 0137 AC T C CAT G G G GT T G GT AG GAACAAAGAAGAT T T T GG G CAT GTAAAGT T CT T AGT GC C GAG 2 0078 

Qy 181 tgcacagtggtctgtaagtgaagctgcggttcttagtggtagaaggagctgattgatggc 240 

I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | | | | M I | I I I I 

Db 20077 TGCACAGTGGTCTGTAAGTGAAGCTGCGGTTCTTAGTGGTAGAAGGAGCTGATTGATGGC 20018 

Qy 241 cctggctgagaactttgtgttcgctttttcccnttttaattcaggatcagttacagttgt 300 

I I I I I M M I I I I I I I I I I I I I I I I M I I I I I I || I I I I I I I I I I I I I M I I I I I I I I I 

Db 20017 CCTGGCTGAGAACTTTGTGTTCGCTTTTTCCCCTTTTAATTCAGGATCAGTTACAGTTGT 19958 



Qy 301 ggattcgaggcaacgtgagtgcctgtgcgaggtccatcttcatatttgatgaaatggata 360 

I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I | | 
Db 19957 GGATTCGAGGCAACGTGAGTGCCTGTGCGAGGTCCATCTTCATATTTGATGAAATGGATA 19898 




Qy 361 agatgcatgcaggcctcatagatgccntcaancctttcctcgactattatgacctggt 418 

I I I I I I I I I I I I I I I I I II I I I I I I I MM I I I II I I I I I I I I I M I I I I I I I I I I 
Db 19897 AGAT GCAT GCAGGC CT CAT AGAT GCCAT CAAGC CT T T C CT C GACT AT TAT GAC CT GGT 19840 



RESULT 1 

AC027008 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 



TITLE 
JOURNAL 

COMMENT 





AC027008 166889 bp DNA linear HTG 08-NOV-2000 

Homo sapiens chromosome 8 clone RP11-212N14 map 8, WORKING DRAFT 
SEQUENCE, 26 unordered pieces. 
AC027008 

AC027008.4 GI:10280898 

HTG; HTGS_PHASE1; HTGS_DRAFT. 

human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates ; Catarrhini ; Hominidae; Homo . 

1 (bases 1 to 166889) 

Birren,B., Linton, L., Nusbaum, C. and Lander, E. 
Homo sapiens chromosome 8, clone RP11-212N14 
Unpublished 

2 (bases 1 to 166889) 

Birren,B., Linton, L., Nusbaum, C, Lander, E., Abraham, H., Allen, N., 
Anderson, S., Baldwin, J., Barna,N., Bastien,V., Beda,F., 
Boguslavkiy, L. , Boukhgalter, B. , Brown, A. , Burkett, G. , 
Campopiano, A. , Castle, A., Choepel,Y., Colangelo, M. , Collins, S., 
Collymore, A. , Cooke, P., DeArellano, K. , Dewar,K., Diaz, J. S., 
Dodge, S., Domino, M. , Doyle, M. , Ferreira,P., FitzHugh,W., Gage,D., 
Galagan,J., Gardyna,S., Ginde,S., Goyette,M., Graham, L., 
Grand-Pierre, N . , Grant, G., Hagos,B., Heaford,A., Horton,L., 
Howland, J. C. , Iliev, I., Johnson, R. , Jones, C, Kann,L., Karatas,A., 
Klein, J., LaRocque,K., Lamazares , R. , Landers,T., Lehoczky,J., 
Levine,R., Lieu,C, Liu, G . , Locke, K. , Macdonald, P . , Marquis, N., 
McCarthy, M. , McEwan,P., McGurk,A. , McKernan,K., McPheeters , R. , 
Meldrim, J., Meneus,L., Mihova,T., Miranda, C, Mlenga,V., Morrow, J., 
Murphy, T., Naylor,J., Norman, C.H., O'Connor, T., O 1 Donnell, P . , 
0'Neil,D., 01ivar,T.M., Oliver, J., Peterson, K., Pierre, N., 
Pisani,C, Pollara,V., Raymond, C, Riley, R. , Rogov,P., Rothman,D., 
Roy, A., Santos, R. , Schauer,S., Severy,P., Spencer, B., 
Stange-Thomann,N. , Sto j anovic, N . , Subramanian, A. , Talamas, J. , 
Tesfaye,S., Theodore, J., Tirrell,A., Travers,M., Trigilio,J., 
Vassiliev, H . , Viel,R., Vo,A. , Wilson, B., Wu,X., Wyman,D., Ye,W.J., 
Young, G., Zainoun,J., Zimmer,A. and Zody,M. 
Direct Submission 

Submitted (25-MAR-2000) Whitehead Institute/MIT Center for Genome 

Research, 320 Charles Street, Cambridge, MA 02141, USA 

On Sep 23, 2000 this sequence version replaced gi: 8080819. 

All repeats were identified using RepeatMasker : 

Smit, A.F.A. & Green, P. (1996-1997) 

http : / / ftp . genome . Washington . edu/RM/RepeatMasker . html 
Genome Center 

Center: Whitehead Institute/ MIT Center for Genome Research 

Center code: WIBR 

Web site: http://www-seq.wi.mit.edu 

Contact : sequence_submissions@genome . wi .mit . edu 
Project Information 

Center project name: L8771 

Center clone name: 212_N_14 
Summary Statistics 

Sequencing vector: M13; M77815; 100% of reads 

Chemistry: Dye-terminator Big Dye; 100% of reads 

Assembly program: Phrap; version 0.960731 



Consensus quality: 150653 bases at least Q40 

Consensus quality: 158747 bases at least Q30 

Consensus quality: 161880 bases at least Q20 

Insert size: 186000; agarose-fp 

Insert size: 164389; sum-of-contigs 

Quality coverage: 3.6 in Q20 bases; agarose-fp 

Quality coverage: 4.0 in Q20 bases; sum-of-contigs 



* NOTE: This is a 'working draft 1 sequence. It currently 

* consists of 26 contigs . The true order of the pieces 

* is not known and their order in this sequence record is 

* arbitrary. Gaps between the contigs are represented as 

* runs of N, but the exact sizes of the gaps are unknown. 

* This record will be updated with the finished sequence 

* as soon as it is available and the accession number will 

* be preserved. 

* 1 140: contig of 140 bp in length 

* 141 240: gap of 100 bp 

* 241 1566: contig of 1326 bp in length 

* 1567 1666: gap of 100 bp 

* 1667 26279: contig of 24613 bp in length 

* 26280 26379: gap of 100 bp 

* 26380 27676: contig of 1297 bp in length 

* 27677 27776: gap of 100 bp 

* 27777 29820: contig of 2044 bp in length 

* 29821 29920: gap of 100 bp 

* 29921 33216: contig of 3296 bp in length 

* 33217 33316: gap of 100 bp 

* 33317 36627: contig of 3311 bp in length 

* 36628 36727: gap of 100 bp 

* 36728 39382: contig of 2655 bp in length 

* 39383 39482: gap of 100 bp 

* 39483 42417: contig of 2935 bp in length 

* 42418 42517: gap of 100 bp 

* 42518 46306: contig of 3789 bp in length 

* 46307 46406: gap of 100 bp 

* 46407 50207: contig of 3801 bp in length 

* 50208 50307: gap of 100 bp 

* 50308 53363: contig of 3056 bp in length 

* 53364 53463: gap of 100 bp 

* 53464 56760: contig of 3297 bp in length 

* 56761 56860: gap of 100 bp 

* 56861 61207: contig of 4347 bp in length 

* 61208 61307: gap of 100 bp 

* 61308 65984: contig of 4677 bp in length 

* 65985 66084: gap of 100 bp 

* 66085 72072: contig of 5988 bp in length 

* 72073 72172: gap of 100 bp 

* 72173 77741: contig of 5569 bp in length 

* 77742 77841: gap of 100 bp 

* 77842 85850: contig of 8009 bp in length 

* 85851 85950: gap of 100 bp 

* 85951 92902: contig of 6952 bp in length 

* 92903 93002: gap of 100 bp 

* 93003 103668: contig of 10666 bp in length 

* 103669 103768: gap of 100 bp 

* 103769 109322: contig of 5554 bp in length 



misc feature 



misc 



109323 109422: gap of 100 bp 

109423 118526: contig of 9104 bp in length 
118527 118626: gap of 100 bp 

118627 128874: contig of 10248 bp in length 
128875 128974: gap of 100 bp 

128975 138016: contig of 9042 bp in length 
138017 138116: gap of 100 bp 

138117 166500: contig of 28384 bp in length 
166501 166600: gap of 100 bp 

* 166601 166889: contig of 289 bp in length. 
FEATURES Location/Qualifiers 
source 1. . 166889 

/organism="Homo sapiens" 
/db_xref= M taxon: 9606" 
/ chromosome= M 8" 
/map= ,, 8" 

/clone= n RPll-212N14 n 
/clone_lib="RPCI-ll Human Male BAC" 
1. .140 

/note= " as sembly_f ragment 
clone_end : SP6 
vector_side : left" 
feature 241. . 1566 

/note="assembly_f ragment" 
feature 1667. .26279 

/ note=" as sembly_f ragment" 
feature 2 6380 . .2767 6 

/ note=" as sembly_f ragment" 
feature 27777. .29820 

/ note= " as sembly_f ragment " 
feature 29921. .33216 

/note="assembly_f ragment" 
feature 33317. .36627 

/ note= " as sembly_f ragment" 
_feature 36728. .39382 

/note="assembly_f ragment" 
_feature 39483. .42417 

/ note="assembly_f ragment" 
_feature 42518. .46306 

/note="assembly_f ragment" 
_feature 46407. .50207 

/ note="assembly_f ragment " 
_feature 50308. .53363 

/note= " as sembly_f ragment" 
feature 53464. .56760 

/ note= " as sembly_f ragment" 
_feature 56861. .61207 

/note="assembly__f ragment" 
_feature 61308. .65984 

/note= " as sembly_f ragment " 
_feature 66085. .72072 

/note="assembly_f ragment" 
_feature 72173. .77741 

/note="assembly_f ragment" 
_feature 77842. .85850 

/ note="assembly_f ragment" 
feature 85951. .92902 



misc 



misc 



misc 



misc 



misc 



misc 



misc 



misc 



misc 



misc 



misc 



misc 



misc 



misc 



misc 



misc_f eature 
misc_f eature 
misc_f eature 
misc_f eature 
mi sc_f eature 
mis c_f eature 
misc_f eature 

BASE COUNT 43782 a 38337 c 39037 g 43225 t 2508 others 
ORIGIN 

Query Match 92.3%; Score 1201.8; DB 2; Length 166889; 

Best Local Similarity 97.5%; Pred. No. 0; 

Matches 1271; Conservative 10; Mismatches 15; Indels 8; Gaps 6; 

Qy 1 gccactccaagctaccatctgagattgtttcctgccctagagtggtaaaggcgtgaggtc 60 

I I I I I I I I I I I I I I I I I M I I I I I I I I I I II I I I I I I I I I I I I I I M I I I I I I I I I I I 
Db 108011 GC CACT C CAAGC - AC CAT CT GAGAT TGTTTCCTG C CCT AGAGT GGTAAAG CC GT GAGGT C 108069 

Qy 61 cgtctgccctcagctgtgtccccaggcccagggcgtgcctggcaacannagcaggcctct 120 

I I I I I I I I I I I I I I I I I I I M I I I I I I I II M I I I I I I I I I M I I I I I I I I I I I I 

Db 108070 CGTCTGCCCTCAGCTGTGTCCCCAGGCCCAGGGCGTGCCTGGCA— CAGAGCAGGCCTCT 108127 

Qy 121 gagaaccagcctcccacgtgagttcatgatagnaagacagcccctcgttcccattcagtg 180 

I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I M I I I I I I I I I I I I I I I I M M I I I I I 
Db 108128 GAGAAC CAGC CT C C CAC GT GAGT T CAT GAT AGCAAGACAGCC C CT CGT T C CCAT T CAGT G 108187 

Qy 181 gttggttctgttctttycctggcmataagctccactctg-ymrtcagccamacatttatt 239 

I I I I I I I I I I I I I I I I : I I M I I : I I I I I I I I I I I I I I : : : M M I I I : I I I I I I I I I 
Db 10818 8 GTTGGTTCTGTTCTTTCCCTGGCCATAGGCTCCACTCTGTCAGTCAGCCACACATTTATT 108247 

Qy 240 gagtaccagttgttggcaaagcactgttgggcatgaaaagcattaacccagtgaatgagg 299 

I I I I I I I I I I I I I I I I I I M I I I I I I I I I M I I I I I I I I I I I I I I M I II I I I I I I I I 
Db 10824 8 GAGT AC CAGT T GT GT GCAAAG CAC T GT T GGGCAT GAAAAGCAT TAAC C CAGT GAAT GAGG 108307 

Qy 300 aggagcttgggtt-gggacggagccmcaraawtacatggcagaccagaaggaaatcagct 358 

I I I I I I I I I I I I I I I I I I I I II I I : I I : I I : I I I I I I I I I I M I I I I I I I I I I I I I I I 

Db 108308 AGGAG CTT G GGTT GGGGAC G GAG C C C CAGAATTACAT GGCAGAC CAGAAGGGAAT CAG CT 108367 

Qy 359 caagtagaaaracacgcatgggctcgtgggcgacgcagtgtgtgctgtgtcatctggggc 418 

II I I : I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I M I I I I 

Db 108368 CA — GTAGAAGACACGCATGGGCTCGTGGGCGACGCAGTGTGTGCTGTGTCATCTGGGGC 108425 

Qy 419 tgggaggaagtgtcctggatcaggagttccaggagcccaggaggagtggacgggtcagtg 478 

I I I I I I I I I I I I I M I I I I I I I I M I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I II 
Db 10 8 42 6 TGGGAGGAAGTGTCCTGGGTCAGGAGTTCCAGGAGCCCAGGAGGAGTGGACGGGTCAGTG 108 4 85 




/note="assembly_f ragment" 
93003. .103668 
/note= " as sembly_f ragment 11 
103769. .109322 
/ no te=" as sembly_f ragment" 
109423. .118526 
/note="assembly_f ragment" 
118627. .128874 
/note- " as sembly_f ragment" 
128975. .138016 
/note= 11 as sembly_f ragment" 
138117. .166500 
/ note=" as sembly_f ragment" 
166601. .166889 
/ note="assembly_f ragment 
clone_end : T7 
vector_side: right" 



Qy 


479 


Db 


108486 


Qy 


539 


Db 


108545 


Qy 


599 


Db 


108605 


Qy 


659 


Db 


108665 


Qy 


719 


Db 


108725 


Qy 


779 


Db 


108785 


Qy 


839 


Db 


108845 


Qy 


899 


Db 


108905 


Qy 


959 


Db 


108965 


Qy 


1019 


Db 


109025 


Qy 


1079 


Db 


109085 


Qy 


1139 


Db 


109145 


Qy 


1199 


Db 


109205 


Qy 


1259 


Db 


109265 



cagagccagcccgcaatcaggggaagaaaacacggccaaggccaggccttcacggggagc 538 
I I I I I I I I I I I I I I I I I M I I I I I I I M I I I I I I I I I M I I I I I I I I I I I I I II M I I 
CAGAGCCAGCCCGCATTCAGGGG-AGAAAACACGGCCAAGGCCAGGCCTTCACGGGGAGC 108544 

ccagcgtgggctgcacatctgcactctccaggctagttttggtgcccacatgctctgcag 598 
I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
CCAGCGTGGGCTGCACATCTGCACTCTCCAGGCTAGTTTTGGTGCCCACATGCTCTGCAG 108604 

ggtctgggcactgtggcagcggcagcaggcttccctgttgctagtccagctgctgaaact 658 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II 
GGTCTGGGCACTGTGGCAGCGGCAGCAGGCTTCCCTGTTGCTAGTCCAGCTGCTGAAACT 108 664 

ccagggagagtcaaaaagttcccaaatacagaggcgtggctggtagtccttcccgggaat 718 
I II II I I I I I I I I II I M I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
CCAGGGAGAGTCAAAAAGTTCCCAAATACAGAGGCGTGGCTGGTAGTCCTTCCCGGGAAT 108724 

tcttcttgcttcccgctttctgtggaactctgccttccccactctgcctctctgcttgtt 778 
M I M I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I 
TCTTCTTGCTTCCCGCTTTCTGTGGAACTCTGCCTTCCCCACTCTGCCTCTCTGCTTGTT 108784 

cctgggccccaggacctctttcccatcttcgatctcttaagtcataccttgggaggcctc 838 
M M M M M I M I M II I I I I I I I I I I I II I I I I I I I I I I II M I I I I I I I I I I I I I I I 

CCTGGGCCCCAGGACCTCTTTCCCATCTTCGATCTCTTAAGTCATACCTTGGGAGGCCTC 108844 

ccccagcccgccgtgtaaagagggctgtcacagcttctgctgtcacagaagcattacaat 898 
I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I M I I I I I I I I I I I I I I I I I I M 
CCCCAGCCCGCCGTGTAAAGAGGGCTGTCACAGCTTCTGCTGTCACAGAAGCATTACAAT 108904 

gtgcaggtgcctgttaacatctgccttccccactgatctggagctccacaagggagaggg 958 
I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I M 
GTGCAGGTGCCTGTTAACATCTGCCTTCCCCACTGATCTGGAGCTCCACAAGGGAGAGGG 108 964 

cacacccagtaggtatgtgtgggatggataggagggtggatgacacccagtagatgtgta 1018 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 
CACACCCAGTAGGTATGTGTGGGATGGATAGGAGGGTGGATGACACCCAGTAGATGTGTA 109024 

tgggatggataggagggtggatgacacccagtaggtgtgtatgggatggatgggagggtg 1078 

I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I 

T GG GAT G GAT AG GAG G GT G GAT GACAC C CAGT AGGT GT GTAT GGGAT GGAT GGGAGGGT G 109084 

ggtgacccctagtagatgtggggggggtgggtgggtgacccccagtaggtgtgtgtggca 1138 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II 
GGTGACCCCTAGTAGATGTGGGGGGGGTGGGTGGGTGACCCCCAGTAGGTGTGTGTGGCA 109144 

tggataggtgacccccagtagacgtttgtgggacggatgggagggtaggtaagtgacccc 1198 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I 
TGGATAGGTGACCCCCAGTAGACGTTTGTGGGACGGATGGGAGGGTAGGTAAGTGACCCC 109204 

caggaggcgtctatagggcaggtgggtggatgtggatgaacagcaccttgtttcttcttc 1258 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I 

CAG GAG G CGTCTATAGGGCAGGTGGGTG GAT GT GGAT GTVACAGCACCTTGTTTCTTCTTC 1092 64 

ccaggtggcttctggcacagcagcttaattgaccggaacctcat 1302 
I I I I I M I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
CCAGGTGGCTTCTGGCACAGCAGCTTAATTGACCGGAACCTCAT 109308 
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AL158207 169963 bp DNA linear PRI 25-SEP-2001 

Human DNA sequence from clone RP11-409K20 on chromosome 9 Contains 
the TOR1B gene for torsin family 1 member B (torsin B) (DQ1) , the 
DYT1 gene for 'dystonia 1, torsion 1 (autosomal dominant; torsin A) 
(DQ2, T0R1A) , the gene for hepatocellular carcinoma-associated 
antigen 59 (HSPC220, LOC51759) , the USP20 gene for ubiquitin 
specific protease 20 (KIAA1003), and the gene for f ormin-binding 
protein 17 (FBP17, includes KIAA0554, FLJ13619, FLJ10754 and 
FLJ10113) . Contains ESTs, STSs, GSSs and four CpG islands, complete 
sequence . 
AL158207 

AL158207.15 GI: 12717949 

HTG; CpG island; DQ1; dystonia; DYT1; FBP17; FLJ10113; FLJ10754; 
FLJ13619; f ormin-binding; hepatocellular; HSPC220; KIAA0554; 
KIAA1003; LOC51759; protease; TOR1B; torsin; ubiquitin; USP20. 
human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia ; Eutheria ; Primates ; Catar rhini ; Hominidae ; Homo . 
1 (bases 1 to 169963) 
Babbage, A. 
Direct Submission 

Submitted (25-SEP-2001) Sanger Centre, Hinxton, Cambridgeshire, 
CB10 ISA, UK . E-mail enquiries : humquery@sanger .ac.uk Clone 
requests : clonerequest@s anger .ac.uk 

On Feb 8, 2001 this sequence version replaced gi:12657099. 
During sequence assembly data is compared from overlapping clones. 
Where differences are found these are annotated as variations 
together with a note of the overlapping clone name. Note that the 
variation annotation may not be found in the sequence submission 
corresponding to the overlapping clone, as we submit sequences with 
only a small overlap as described above. 

This sequence was finished as follows unless otherwise noted: all 
regions were either double-stranded or sequenced with an alternate 
chemistry or covered by high quality data (i.e., phred quality >= 
30) ; an attempt was made to resolve all sequencing problems, such 
as compressions and repeats; all regions were covered by at least 
one plasmid subclone or more than one M13 subclone; and the 
assembly was confirmed by restriction digest. The following 
abbreviations are used to associate primary accession numbers given 
in the feature table with their source databases: Em:, EMBL; Sw: , 
SWISSPROT; Tr:, TREMBL; Wp : , WORMPEP; Information on the WORMPEP 
database can be found at 

http: / /www. sanger. ac.uk/Projects/C_elegans/wormpep This sequence 
was generated from part of bacterial clone contigs of human 
chromosome 9, constructed by the Sanger Centre Chromosome 9 Mapping 
Group. Further information can be found at 
http: //www. sanger . ac.uk/HGP/Chr9 

RP11-409K20 is from the library RPCI-11.2 constructed by the group 
of Pieter de Jong. For further details see 
http : / /www. chori . org/bacpac/home . htm 
VECTOR: pBACe3.6 

This sequence is the entire insert of clone RP11-409K20 The true 
left end of clone RP11-138E2 is at 118932 in this sequence. 
Location/Qualifiers 



source 1. .169963 

/organism="Homo sapiens" 

/db_xref="taxon: 9606" 

/ chr omo s ome= " 9 " 

/ cl one= " RP 1 1 - 4 0 9 K2 0 " 

/ cl one_l ib= " RPCI - 1 1 . 2 " 
repeat_region 5. .86 

/note="MSTC repeat: matches 46. .126 of consensus" 
misc_feature 28. .462 

/note="match: GSS: Em: AQ718881" 
repeat_region 817. .992 

/note="Charlie2 repeat: matches 7. .195 of consensus" 
misc_feature complement (2510 . .2941) 

/note="match: GSS: Em:AQ041615" 
misc_feature 2944. .3096 

/note="match: GSS: Em:B74700" 
misc_feature 3329. .4807 

/note="CpG island" 

/evidence=not_experimental 
mRNA join{4205. .4464,5126. .5391,8241. .8416,9958. .10085, 

10395. .12334) 

/gene="TORlB" 

/note="match: cDNAs : Em:AF007872 Em:AJ297743 
match: ESTs : Em:AI815528 Em:AW160403 Em:AW972065 
Em:BF313148 Em:AA112625 Em:BE740991 Em:BE563034 
Em:BE893335 Em:AV728123 Em:AW952051 Em:AW148938 
Em:BE502754 Em:AW016676 Em:AI223067 Em:BE108689 
Em:AI808893 Em:AW173267 Em: AI185247" 

/product="bA409K20 . 1 . 1 (torsin family 1, member B (torsin 
B) (DQ1) ) " 

/evidence=not_experimental 
gene 4205. .12334 

/gene="TORlB" 

CDS join(4266. .4464,5126. .5391,8241. .8416,9958. .10085, 

10395. .10636) 
/gene="TORlB" 

/note="match: proteins: Tr:014657" 

/ codon_start=l 

/ evidence=not_experimental 

/product="bA409K20. 1. 1 (torsin family 1, member B (torsin 
B) (DQ1 ) ) " 

/protein_id="CAC88165. 1" 
/db_xref="GI: 15787707" 

/translation="MLRAGWLRGAAALALLLAARWAAFEPITVGLAIGAASAITGYL 
SYNDIYCRFAECCREERPLNASALKLDLEEKLFGQHLATEVIFKALTGFRNNKNPKKP 
LTLSLHGWAGTGKNFVSQIVAENLHPKGLKSNFVHLFVSTLHFPHEQKIKLYQDQLQK 
WI RGNVSACANSVFI FDEMDKLHPGI I DAI KP FLDYYEQVDGVS YRKAI FI FLSNAGG 
DLITKTALDFWRAGRKREDIQLKDLEPVLSVGVFNNKHSGLWHSGLIDKNLIDYFIPF 
LPLEYRHVKMCVRAEMRARGSAIDEDIVTRVAEEMTFFPRDEKIYSDKGCKTVQSRLD 
FH" 

mRNA join(4321. .4464,5126. .5391,11280. .11571) 

/gene="TORlB" 
/note= n isof orm 3 

match: ESTs: Em:BF058863 Em: BE315222 " 

/product="bA409K20 . 1 . 3 (torsin family 1, member B (torsin 
B) (DQ1), putative isoform 3)" 
/ evidence=not_experimental 



CDS 



mRNA 



mRNA 



CDS 



misc_f eature 

repeat_region 

mis c_f eature 

misc_f eature 

polyA__signal 

polyA_site 

polyA_site 

mRNA 



join(<4321. .4464,5126. .5391,11280. .11294) 

/gene="TORlB" 

/codon_start=3 

/evidence=not_experimental 

/product="bA409K20 . 1 . 3 (torsin family 1, member B (torsin 
B) (DQ1) , putative isoform 3)" 
/protein_id= n CAC88166. 1" 
/db_xref="GI: 15787708" 

/ trans la tion =,, RVVAAFEPITVG LAI GAASAITGYLSYNDIYCRFAECCREERPL 
NASALKLDLEEKLFGQHLATEVI FKALTGFRNNKNPKKPLTLSLHGWAGTGKNFVSQI 
VAENLHPKGLKSNFVHLFVSTLHFPHEQKIKLYQSSLT" 

join{5126. .5266,8241. .8416,9958. .10085,10395. .10456) 
/gene="TORlB" 

/note="match: ESTs : Em: AI4 68027 " 
/product="bA409K20.1.2 (isoform 2)" 
/evidence=not_experimental 

join(5159. .5391,8241. .8416,11280. .11319) 

/gene= n TORlB" 

/note="isof orm 4 

match: ESTs: Em:AI568476" 

/product="bA409K20.1.4 (torsin family 1, member B (torsin 
B) (DQ1) , putative isoform 4)" 
/evidence=not_experimental 

join(<5159. .5391,8241. .8416,11280. .11289) 

/gene="TORlB" 

/ codon_start=3 

/ evidence=not_experimental 

/product="bA409K20.1.4 (torsin family 1, member B (torsin 
B) (DQ1), putative isoform 4)" 
/protein_id="CAC88167 . 1" 
/db_xref="GI : 15787709" 

/translation="QHLATEVIFKALTGFRNNKNPKKPLTLSLHGWAGTGKNFVSQIV 

AENLHPKGLKSNFVHLFVSTLHFPHEQKIKLYQDQLQKWIRGNVSACANSVFIFDEMD 

KLHPGI I DAI KP FLDYYEQVDGVS YRKAI FI FLRVH " 

5188. .5526 

/gene="TORlB" 

/note="match: STS : Em:G24606" 
7370. .7432 

/ note="MER61E repeat: matches 128. .190 of consensus" 

complement (11923. . 12334) 

/note="match: STS: Em:G27406" 

complement (12097 . .12334) 

/note="match: STS: Em:G24725" 

12313. .12318 

/gene="TORlB" 

12334 

/gene="TORlB" 
complement (13997) 
/gene="DYTl" 
complement (join (13997 
23634. .23899,24961. 
/gene="DYTl" 

/note="match: cDNAs : Em:AF007871 

match: ESTs: Em:BE272533 Em:BE314317 Em:BE784377 
Em:BF203163 Em:BE622540 Em:AI039978 Em:AI770117 
Em:AW050630 Em:AI970719 Em:BE463967 Em:AI374678 
Em:AI167967 Em:AI127274 Em:AI699731 Em:AW001722 



. 15275, 19573, .19700, 197 98. . 19973, 
25180) ) 



gene 

polyA_signal 
misc_f eature 
misc_f eature 

mi sc_f eature 

mis c_f eature 

polyA_site 
misc_f eature 
misc_feature 
misc_f eature 
misc_f eature 
CDS 



Em:AI301894 Em: AW08098 8 " 

/product="bA409K20 .2 (dystonia 1, torsion (autosomal 

dominant; tor sin A) (DQ2, T0R1A) ) " 

/evidence=not_experimental 

complement (13997 . .2518 0) 

/gene= M DYTl" 

complement (14010. . 14015) 

/gene="DYTl" 

14016. .14298 

/note="match: STS: Em:G30092" 
complement (14429. .14885) 
/gene= H DYTl" 

/note="match: GSS: Em: B69651" 
complement (14469. .14876) 
/gene="DYTl" 

/note="match: GSS: Em:B48142" 
complement (14494. .14860) 
/gene="DYTl" 

/note= M match: GSS: Em:AQ566167" 
complement (14632 ) 
/gene="DYTl M 
14650. .15099 

/note="match: STS: Em:G60041 Em:G60042" 
14807. .14914 

/note="match: STS: Em:G43378 Em: G43379" 
14885. .15212 

/note="match: GSS: Em: AQ213491" 
14890. .15392 

/note="match: GSS: Em: AQ482600" 

complement (join (15025. .15275,19573. .19700, 197 98. .19973, 
23634. .23899,24961. .25138)) 



Query Match 92.3%; Score 1201.8; DB 9; Length 169963; 

Best Local Similarity 97.5%; Pred. No. 0; 

Matches 1271; Conservative 10; Mismatches 15; Indels 8; Gaps 



6; 



Qy 1 gccactccaagctaccatctgagattgtttcctgccctagagtggtaaaggcgtgaggtc 60 

I I I I I I M M I I I I M M I I I I I I I I M I I I I I I I I I M M I I I I I I I I I I I I I M I I 

Db 16533 GCCACT CCAAGC- AC CAT C T GAGAT T GT T T C CT GC C CT AGAGT G GTAAAGC C GT GAG GT C 16475 

Qy 61 cgtctgccctcagctgtgtccccaggcccagggcgtgcctggcaacannagcaggcctct 120 

I I I M M M I I I I I I M I I I I I I I I I I M I I I I I I I II I I I I I I I I I I I II I I I I 

Db 16474 CGTCTGCCCTCAGCTGTGTCCCCAGGCCCAGGGCGTGCCTGGCA--CAGAGCAGGCCTCT 16417 

Qy 121 gagaaccagcctcccacgtgagttcatgatagnaagacagcccctcgttcccattcagtg 180 

I I II M I M I II I I I II II I I I I I I I I I M I I I I I I I M I I I I I I I I I I I I I I M I I I I 

Db 16416 GAGAAC CAGC CT C C CACGT GAGT T CAT GATAGCAAGAC AG CCCCTCGTTCC CAT T CAGT G 16357 

Qy 181 gttggttctgttctttycctggcmataagctccactctg-ymrtcagccamacatttatt 239 

M I I I I II I I I I I M I : I I I I I I : I M I I I I I I M I I I ::: I I I I I II : I I I I I I I I I 

Db 16356 GTTGGTTCTGTTCTTTCCCTGGCCATAGGCTCCACTCTGTCAGTCAGCCACACATTTATT 16297 



Qy 240 gagtaccagttgttggcaaagcactgttgggcatgaaaagcattaacccagtgaatgagg 299 

M I I I I I I I I I I I I I I I I I M I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 16296 GAGTAC CAGT T GT GT GCAAAGCAC T GTT GGG CAT GAAAAG CAT TAACC CAGT GAAT GAG G 16237 



300 aggagcttgggtt-gggacggagccmcaraawtacatggcagaccagaaggaaatcagct 358 




Db 16236 AGGAGCTTGGGTTGGGGACGGAGCCCCAGAATTACATGGCAGACCAGAAGGGAATCAGCT 16177 

Qy 359 caagtagaaaracacgcatgggctcgtgggcgacgcagtgtgtgctgtgtcatctggggc 418 

II II : II II I M I I I I I I I I I M I I I I I I I I M M I I I I I I I I I I I I I I I I I I I 

Db 16176 CA — GTAGAAGACACGCATGGGCTCGTGGGCGACGCAGTGTGTGCTGTGTCATCTGGGGC 16119 

Qy 419 tgggaggaagtgtcctggatcaggagttccaggagcccaggaggagtggacgggtcagtg 47 8 

I I I I I I I I I I I I I I I I I I I I I I I i M I I I I I I I I M I I I I I I I I I I I I I I I I I I I I M I 
Db 16118 TGGGAGGAAGTGTCCTGGGTCAGGAGTTCCAGGAGCCCAGGAGGAGTGGACGGGTCAGTG 16059 

Qy 479 cagagccagcccgcaatcaggggaagaaaacacggccaaggccaggccttcacggggagc 538 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II 
Db 16058 CAGAGCCAGCCCGCATTCAGGGG-AGAAAACACGGCCAAGGCCAGGCCTTCACGGGGAGC 16000 

Qy 539 ccagcgtgggctgcacatctgcactctccaggctagttttggtgcccacatgctctgcag 598 

I I I I I I I I I I I I I I I I M M I I I I I I I I I M I I I I I I I I I I I M I II I I I I I I I I I I I I I 
Db 15999 CCAGCGTGGGCTGCACATCTGCACTCTCCAGGCTAGTTTTGGTGCCCACATGCTCTGCAG 15940 

Qy 599 ggtctgggcactgtggcagcggcagcaggcttccctgttgctagtccagctgctgaaact 658 

I I I I I I I I I I I I I I I I I I M I I I I I I I I I I M I I I I I I I I I I I I I II I I I I I I I I I I I I I 
Db 15939 GGTCTGGGCACTGTGGCAGCGGCAGCAGGCTTCCCTGTTGCTAGTCCAGCTGCTGAAACT 15880 

Qy 659 ccagggagagtcaaaaagttcccaaatacagaggcgtggctggtagtccttcccgggaat 718 

I II II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II M I I I I I I I I I I I I I I I 
Db 15879 CCAGGGAGAGTCAAAAAGTTCCCAAATACAGAGGCGTGGCTGGTAGTCCTTCCCGGGAAT 15820 

Qy 719 tcttcttgcttcccgctttctgtggaactctgccttccccactctgcctctctgcttgtt 778 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I M II I I I I I I I I I I I I I I I M 
Db 15819 TCTTCTTGCTTCCCGCTTTCTGTGGAACTCTGCCTTCCCCACTCTGCCTCTCTGCTTGTT 15760 

Qy 779 cctgggccccaggacctctttcccatcttcgatctcttaagtcataccttgggaggcctc 838 

I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 15759 CCTGGGCCCCAGGACCTCTTTCCCATCTTCGATCTCTTAAGTCATACCTTGGGAGGCCTC 15700 

Qy 839 ccccagcccgccgtgtaaagagggctgtcacagcttctgctgtcacagaagcattacaat 8 98 

I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I II I I I I I I I I I I I I I I I I M I I I I I I 
Db 15699 C C C CAGC C C GC C GT GTAAAGAG G G CT GT CACAG CTTCTGCTGT CACAGAAG CATTACAAT 15640 

Qy 899 gtgcaggtgcctgttaacatctgccttccccactgatctggagctccacaagggagaggg 958 

I I I I I I I I I I I I I I I I M II I I I I I I I I I M I I I I I I I M II I I I I I I I I I I I I I I I I M 
Db 15639 GTGCAGGTGCCTGTTAACATCTGCCTTCCCCACTGATCTGGAGCTCCACAAGGGAGAGGG 15580 

Qy 959 cacacccagtaggtatgtgtgggatggataggagggtggatgacacccagtagatgtgta 1018 
M M M M M M M M I I I I I I I I I M I I I I I I I I I M II I I I I I I I I I I I I I I I I M I I 

Db 15579 CACAC C CAGT AG GT AT GT GT GGGAT G GAT AGGAGGGT G GAT GACAC C CAGT AGAT GT GT A 15520 

Qy 1019 tgggatggataggagggtggatgacacccagtaggtgtgtatgggatggatgggagggtg 1078 

I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
Db 15519 T GGGAT GGAT AGGAGGGT GGAT GACAC C CAGTAGGT GT GTAT GGGAT GGAT GGGAGGGT G 154 60 

Qy 1079 ggtgacccctagtagatgtggggggggtgggtgggtgacccccagtaggtgtgtgtggca 1138 

I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 
Db 15459 GGTGACCCCTAGTAGATGTGGGGGGGGTGGGTGGGTGACCCCCAGTAGGTGTGTGTGGCA 15400 

Qy 1139 tggataggtgacccccagtagacgtttgtgggacggatgggagggtaggtaagtgacccc 1198 
I I I I I I I I I I I I I I M I I I I I I I I M II I I I I I I II I I I I I I I I I I I M I I I I I I I I I I I 



Db 15399 TGGATAGGTGACCCCCAGTAGACGTTTGTGGGACGGATGGGAGGGTAGGTAAGTGACCCC 15340 

Qy 1199 caggaggcgtctatagggcaggtgggtggatgtggatgaacagcaccttgtttcttcttc 1258 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I! II I I I I I I I I I I I I I I I I I I M I I I 
Db 15339 CAGGAGGCGTCTATAGGGCAGGTGGGTGGATGTGGATGAACAGCACCTTGTTTCTTCTTC 15280 

Qy 1259 ccaggtggcttctggcacagcagcttaattgaccggaacctcat 1302 

I I I I I I I I I I I I II M I I I I I I I I I I I I I I I M I I I I I I I I I I I 
Db 15279 CCAGGTGGCTTCTGGCACAGCAGCTTAATTGACCGGAACCTCAT 15236 



4 

=> fil reg; d que 16 
FILE 1 REGISTRY 1 ENTERED AT 10:54:18 ON 07 JUN 2 002 
USE IS SUBJECT TO THE TERMS OF YOUR STN CUSTOMER AGREEMENT. 
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conducting SmartSELECT searches . 
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L4 63 SEA FI LE=REGI STRY ABB=ON GCAAAACAGGGCUUUGUACCG | CGGUACAAAGCCCUG 

UUUUGC | AGU AGAGAC G C G G GU AG AU G | CAUCUACCCGCGUCUCUACU | GCGUCUCUACUGCC 
UCUUCG | CGAAGAGGCAGUAGAGACGC | AUGCCCUGGUCCUAGUUCAG | UAC GGGAC CAG GAU 
CAAGUC | GGUUUCGCAAGGUGCUUGGAI UCCAAGCACCUUGCGAAACC/SQSN 

L5 109 SEA FILE=REGISTRY ABB=ON GGGAUUCCAAACUUCCAUCC | GGAUGGAAGUUUGGAA 

UCCC | UCCAUGGGGUUGGUAGGAAC | GUUCCUACCAACCCCAUGGAI GGUGACAGAGUAAAAC 
UAUCUG | CAGAUAGUUUUACUCUGUCACC | GACCCCCAGUAGACGUUUGU | ACAAACGUCUAC 
UGGGGGUC | GUAAAAAAU CAU GAG C C CU GC | GCAGGGCUCAUGAUUUUUUAC/ SQSN 

L6 10 SEA FI LE=REGI STRY ABB=ON (L4 OR L5) AND SQL<101 



=> d rn cn kwic nte 16 1-10; fil capl; s 16 

L6 ANSWER 1 OF 10 REGISTRY COPYRIGHT 2002 ACS 

RN 367568-27-2 REGISTRY 

CN DNA, d(G-T-A-A-A-A-A-A-T-C-A-T-G-A-G-C-C-C-T-G-C) (9CI) (CA INDEX NAME) 
OTHER NAMES: 

CN 33: PN: US20010029015 SEQID: 39 claimed DNA 

SQL 21 

SEQ 1 gtaaaaaatc atgagccctg c 



HITS AT: 1-21 

L6 ANSWER 2 OF 10 REGISTRY COPYRIGHT 2002 ACS 
RN 367568-26-1 REGISTRY 

CN DNA, d(G-A-C-C-C-C-C-A-G-T-A-G-A-C-G-T-T-T-G-T) (9CI) (CA INDEX NAME) 
OTHER NAMES: 

CN 32: PN: US20010029015 SEQID: 38 claimed DNA 
SQL 20 

SEQ 1 gacccccagt agacgtttgt 



HITS AT: 1-20 

L6 ANSWER 3 OF 10 REGISTRY COPYRIGHT 2002 ACS 
RN 367568-25-0 REGISTRY 

CN DNA, d(G-G-T-G-A-C-A-G-A-G-T-A-A-A-A-C-T-A-T-C-T-G) (9CI) (CA INDEX NAME) 
OTHER NAMES: 

CN 31: PN: US20010029015 SEQID: 37 claimed DNA 
SQL 22 



SEQ 1 ggtgacagag taaaactatc tg 



HITS AT: 1-22 

L6 ANSWER 4 OF 10 REGISTRY COPYRIGHT 2002 ACS 
RN 367568-24-9 REGISTRY 

CN DNA, d(T-C-C-A-T-G-G-G-G-T-T-G-G-T-A-G-G-A-A-C) (9CI) (CA INDEX NAME) 
OTHER NAMES: 

CN 30: PN: US2001002 9015 SEQID : 36 claimed DNA 
SQL 20 

SEQ 1 tccatggggt tggtaggaac 



HITS AT: 1-20 

L6 ANSWER 5 OF 10 REGISTRY COPYRIGHT 2002 ACS 
RN 367568-23-8 REGISTRY 

CN DNA, d(G-G-G-A-T-T-C-C-A-A-A-C-T-T-C-C-A-T-C-C) (9CI) (CA INDEX NAME) 
OTHER NAMES: 

CN 29: PN: US20010029015 SEQID: 35 claimed DNA 
SQL 20 

SEQ 1 gggattccaa acttccatcc 



HITS AT: 1-20 

L6 ANSWER 6 OF 10 REGISTRY COPYRIGHT 2002 ACS 
RN 367568-22-7 REGISTRY 

CN DNA, d(G-G-T-T-T-C-G-C-A-A-G-G-T-G-C-T-T-G-G-A) (9CI) (CA INDEX NAME) 
OTHER NAMES: 

CN 28: PN: US20010029015 SEQID: 34 claimed DNA 
SQL 20 

SEQ 1 ggtttcgcaa ggtgcttgga 



HITS AT: 1-20 

L6 ANSWER 7 OF 10 REGISTRY COPYRIGHT 2002 ACS 
RN 367568-21-6 REGISTRY 

CN DNA, d(A-T-G-C-C-C-T-G-G-T-C-C-T-A-G-T-T-C-A-G) (9CI) (CA INDEX NAME) 
OTHER NAMES: 

CN 27: PN: US20010029015 SEQID: 33 claimed DNA 
SQL 20 

SEQ 1 atgccctggt cctagttcag 



HITS AT: 1-20 

L6 ANSWER 8 OF 10 REGISTRY COPYRIGHT 2002 ACS 
RN 367568-20-5 REGISTRY 

CN DNA, d(G-C-G-T-C-T-C-T-A-C-T-G-C-C-T-C-T-T-C-G) (9CI) (CA INDEX NAME) 
OTHER NAMES: 

CN 26: PN: US20010029015 SEQID: 32 claimed DNA 
SQL 20 

SEQ 1 gcgtctctac tgcctcttcg 



HITS AT: 1-20 

L6 ANSWER 9 OF 10 REGISTRY COPYRIGHT 2002 ACS 
RN 367568-19-2 REGISTRY 

CN DNA, d(A-G-T-A-G-A-G-A-C-G-C-G-G-G-T-A-G-A-T-G) (9CI) (CA INDEX NAME) 



OTHER NAMES: ^ 

CN 25: PN: US20010029015 SEQID: 31 claimed DNA 
SQL 20 



SEQ 1 agtagagacg cgggtagatg 



HITS AT: 1-20 

L6 ANSWER 10 OF 10 REGISTRY COPYRIGHT 2002 ACS 
RN 367568-18-1 REGISTRY 

CN DNA, d(G-C-A-A-A-A-C-A-G-G-G-C-T-T-T-G-T-A-C-C-G) (9CI) (CA INDEX NAME) 
OTHER NAMES: 

CN 24: PN: US20010029015 SEQID: 30 claimed DNA 
SQL 21 

SEQ 1 gcaaaacagg gctttgtacc g 



HITS AT: 1-21 



FILE 'CAPLUS 1 ENTERED AT 10:54:35 ON 07 JUN 2002 

USE IS SUBJECT TO THE TERMS OF YOUR STN CUSTOMER AGREEMENT. 

PLEASE SEE "HELP USAGETERMS" FOR DETAILS. 
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L7 1 L6 

=> d ibib ab hitrn 



L7 ANSWER 1 OF 1 CAPLUS COPYRIGHT 2002 ACS 



ACCESSION NUMBER: 
DOCUMENT NUMBER: 
TITLE: 



INVENTOR (S) : 
PATENT ASSIGNEE (S) 
SOURCE: 



2001:748274 CAPLUS 
135:316961 

Nucleic acid sequences for torsins encoded by human 
genes DYT1/TOR1A, TOR1B, and torsin-related genes and 
their use in detecting torsion dystonia or neuronal 
disease 

Ozelius, Laurie J.; Breakefield, Xandra O. 
The General Hospital Corp., USA 

U.S. Pat. Appl. Publ., 85 pp., Cont . -in-part of U. S. 



DOCUMENT TYPE: 
LANGUAGE : 

FAMILY ACC. NUM. COUNT: 
PATENT INFORMATION: 



SerT^No. 461,921, abandoned. 
CODEN: USXXCO 
Patent 
English 
3 



PATENT NO. 



KIND DATE 



APPLICATION NO. 



DATE 



US 2001029015 Al 20011011 

US 6387616 Bl 20020514 

PRIORITY APPLN. INFO.: 



US 2001-772105 
US 1998-218363 
US 1997-50244P P 
US 1998-99454 A2 
US 1998-218363 A2 
US 1999-461921 B2 



20010126 
19981222 
19970619 
19980618 
19981222 
19991215 



AB The present invention relates to methods of detecting mutations and 
polymorphisms in the torsin gene, torsin-related genes, methods of 
detecting neuronal diseases mediated by these mutations and polymorphisms 
and nucleic acids used in these methods. A CAG deletion in exon 5 of the 
human gene DYT1/TOR1A and the DQ2 cDNA of this gene (encoding torsinA) 
causes early onset dystonia. The exon/intron structure and cDNAs of gene 
DYT1 have been characterized by sequence anal, and genetic polymorphisms 
have been identified. An adjacent gene on human chromosome 9q34, named 
TOR1B, encodes a homologous protein torsinB. Homol. searches have 
identified human and mouse cDNAs for torsin-related proteins encoded by 
genes TORP1 and TORP2 . This invention provides for further anal, of the 
torsinA gene family and its role in human disease. 

IT 367568-18-1 367568-19-2 367568-20-5 
367568-21-6 367568-22-7 367568-23-8 
367568-24-9 367568-25-0 367568-26-1 
367568-27-2 

RL: ARG (Analytical reagent use); THU (Therapeutic use); ANST (Analytical 

study); BIOL (Biological study); USES (Uses) 

(human gene DYT1/TOR1A specific primer; nucleic acid sequences for 
torsins encoded by human genes TORIA(DYTI) , TOR1B, and torsin-related 
genes and their use in detecting torsion dystonia or neuronal disease) 



=> fil horn 

FILE ' HOME 1 ENTERED AT 10:54:57 ON 07 JUN 2002 



RESULT 1 
AAC69659/c 

ID AAC69659 standard; cDNA; 853 BP. 
XX 

AC AAC69659; . t VU) 

xx \oi[a\ Vjip l \ ; 

DT 30-JAN-2001 (first entry) * x ^f$" 

DE Human torsin A coding sequence. 
XX 

KW Cytostatic; vaccine; human; breast tumour; antigen; breast cancer; ss. 
XX 

OS Homo sapiens. 
XX 

PN WO200052165-A2. 

XX r — — — 

PD ^£-SEP-200CM 

XX " 

PF 2 9-FEB_-20Q_ Q; 2000WO-US05431 . 

XX ~~~~ - 

PR 04-MAR-1999; 99US-0262505 . 

PR 19-MAR-1999; 99US-02728 8 6 . 

PR 17-SEP-1999; 99US-0396313 . 
XX 

PA (CORI-) CORIXA CORP. 
XX 

PI Lodes MJ; 
XX 

DR WPI; 2000-572184/53. 
XX 

PT Breast tumor antigen polypeptides and polynucleotides, useful for 

PT manufacturing vaccines and compositions for treating, diagnosing, and 

PT monitoring breast cancer 

XX 

PS Claim 16; Fig 1; 140pp; English. 
XX 

CC The present invention relates to immunogenic portions of new human 

CC breast tumour antigens (AAB28183-B28214 ) and their coding sequences 

CC (AAC69645-C69804) . The breast tumour antigen polypeptides of the present 

CC invention and their coding sequences are useful for inhibiting the 

CC development of breast cancer in a patient. The breast tumour antigen 

CC polypeptides and polynucleotides may be used in vaccines and 

CC pharmaceutical compositions for treating breast cancer, and for 

CC diagnosing and monitoring the cancer. The present sequence is a coding 

CC sequence for the immunogenic portion for one such human breast cancer 

CC tumour antigen. 

XX 

SQ Sequence 853 BP; 233 A; 177 C; 187 G; 256 T; 0 other; 

Query Match 100.0%; Score 21; DB 21; Length 853; 
Best Local Similarity 100.0%; Pred. No. 0.74; 

Matches 21; Conservative 0; Mismatches 0; Indels 0; Gaps 

Qy 1 gtaaaaaatcatgagccctgc 21 

I I I I I I I I I I I I M II I I I I I 

Db 85 GTAAAAAATCATGAGCCCTGC 65 




RESULT 2 ___ 

AAV99925/C . I b) >V>g{^-' 

ID AAV99925 standard; cDNA; 2072 BP. 

J>9 AC AAV99925; * 1 

XX 

DT 12-MAY-1999 (first entry) 
XX 

DE DYT1 torsion dystonia gene (torsinA) . 
XX 

KW Torsion dystonia; DYT1; torsinA; torsinB; DQ2; DQl; 

KW neurotransmission; movement disorder; chorea; tremor; rigidity; 

KW Huntingtons disease; Parkinsons disease; diagnosis; prognosis; 

KW prevention; treatment; neurology; neuropathology; ds . 

XX 

OS Homo sapiens. 
XX 

FH Key Location/Qualifiers 

FT CDS 43.. 1041 

FT /*tag= a 

FT /product= TorsinA_protein 
XX 

PN W09857984-A2. 

PD f 23-DEC-1998?) 

XX ^ - - 

PF 19-JUN-1998; 98WO-US1277 6 . 
XX 

PR 18-JUN-1998; 98US-0099454 . 

PR 19-JUN-1997; 97US-0050244 . 
XX 

PA (BREA/) BREAKEFIELD X. 

PA (OZEL/) OZELIUS L J. 
XX 

PI Breakefield X, Ozelius LJ; 
XX 

DR WPI; 1999-080947/07. 

DR P-PSDB; AAW81057. 
XX 

PT New isolated torsion dystonia genes - used to develop products for 

PT the diagnosis, prognosis, prevention and treatment of torsion 

PT dystonia 
XX 

PS Example 2; Page 106-109; 138pp; English. 
XX 

CC Movement disorders generally comprise some kind of aberrant 

CC neurotransmission. These often manifest themselves as 

CC uncontrollable body movements such as chorea in Huntington ! s 

CC disease, tremor and rigidity in Parkinson's disease and twisting 

CC contractions in torsion dystonia. Dystonic syptoms can be 

CC secondary to neurological conditions but primary or torsion 

CC dystonia is characterised by a lack of other neurologic involvement 

CC and the absence of any distinct neuropathology. Clinical 

CC manifestations of torsion dystonia can affect many different body 

CC regions. Novel torsion dystonia genes, their polypeptide and 




CC protein products, recombinant nucleic acids comprising them, cells 

CC transformed by them or recombinant molecules in which they are 

CC contained, as well as antibody molecules directed against them, 

CC can be used to develop products for the diagnosis, prognosis, 

CC prevention and treatment of torsion dystonia. In particular, the 

CC torsin polypeptides can be used to treat torsion dystonia. This 

CC sequence is a composite nucleotide sequence of the torsinA gene . 
XX 

SQ Sequence 2072 BP; 530 A; 489 C; 510 G; 543 T; 0 other; 



Query Match 100.0%; Score 21; DB 20; 

Best Local Similarity 100.0%; Pred. No. 0.83; 
Matches 21; Conservative 0; Mismatches 0; 



Length 2072; 
Indels 0; 



Gaps 



0; 



AAV59658; 

19-JAN-1999 (first entry) 

Human secreted protein gene 14 8 clone HSKG026. 



Qy 1 gtaaaaaatcatgagccctgc 21 

I I I I I I I I I I I I I I I I I I I I I 
Db 1315 GTAAAAAATCATGAGCCCTGC 12 95 



RESULT 3 
AAV59658/c 

ID AAV59658 standard; DNA; 2117 BP. 
XX 
AC 
XX 
DT 
XX 
DE 
XX 
KW 
KW 
KW 
KW 
KW 
KW 
KW 
KW 
XX 
OS 

,-xx" 

VPN_ 
XX 
PD 
XX 
PF 
XX 
PR 
PR 
PR 
PR 
PR 
PR 
PR 
PR 
PR 
PR 



Human; secreted protein; fusion protein; gene therapy; protein therapy; 
diagnosis; tissue; cancer; tumour; neurodegenerative disorder; leukaemia; 
developmental abnormality; foetal deficiency; blood; allergy; renal; ds; 
immune system; asthma; lymphocytic disease; brain; hepatic; lymphoma; 
inflammation; ischaemic shock; Alzheimer's disease; restenosis; AIDS; 
cognitive disorder; schizophrenia; prostate; obesity; osteoclast; thymus; 
osteoporosis; arthritis; testis; lung; thyroiditis; thyroid; digestion; 
endocrine; metabolism; regulation; malabsorption; gastritis; neoplasm. 



Homo_sapiens . 



W09839448-A2 



11 


-SEP- 


1998 






06 


-MAR- 


1998, 


98WO- 


US04493. 


02 


-OCT- 


1997, 


97US- 


0061060. 


07 


-MAR- 


1997, 


97US- 


0038621. 


07 


-MAR- 


1997, 


97US- 


0040161. 


07 


-MAR- 


1997, 


97US- 


0040162. 


07 


-MAR- 


1997, 


97US- 


0040163. 


07 


-MAR- 


1997, 


97US- 


0040333. 


07 


-MAR- 


1997, 


97US- 


0040334 . 


07 
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XX 

PA (HUMA-) HUMAN GENOME SCI INC. 
XX 

PI Bednarik DP, Brewer LA, Carter KC, Duan R, Ebner R, Endress GA; 

PI Feng P, Ferrie AM, Fischer CL, Florence KA, Greene JM, Hu JS; 

PI Kyaw H, Lafleur DW, Li Y, Moore PA, Ni J, Olsen HS, Rosen CA; 

PI Ruben SM, Shi Y, Soppet DR, Young PE, Yu GL, Zeng Z; 
XX 

DR WPI; 1998-506364/43. 

DR P-PSDB; AAW74876. 
XX 

PT New isolated human genes and the secreted polypeptide ( s ) they encode 

PT - useful for diagnosis and treatment of e.g. cancers, neurological 

PT disorders, immune diseases, inflammation or blood disorders 
XX 

PS Claim 1; Page 383-384; 721pp; English. 
XX 

CC This sequence represents a nucleic acid molecule designated Gene 148 

CC from the human cDNA clone HSKG026 (deposited as clone ATCC 97903 and 

CC ATCC 209049) which encodes a secreted human protein. The gene can be 

CC used to generate fusion proteins by linking to the gene to a human 

CC immunoglobulin Fc portion (e.g. AAV59502) for increasing the stability of 

CC the fused protein as compared to the human protein only. 

CC The invention relates to 186 novel genes and their fragments (nucleic 

CC acid sequences: AAV59511-V59812 ; amino acid sequences AAW74731-W75026) 




CC which are useful for preventing, treating or ameliorating medical 

CC conditions e.g. by protein or gene therapy. Also, pathological 

CC conditions can be diagnosed by determining the amount of the new 

CC polypeptides in a sample or by determining the presence of mutations in 

CC the new polynucleotides. Specific uses are described for each of the 186 

CC polynucleotides, based on which tissues they are most highly expressed in 

CC (see AAV59511 for described uses) . 

XX 

SQ Sequence 2117 BP; 556 A; 495 C; 516 G; 547 T; 3 other; 



Query Match 100.0%; Score 21; DB 19; Length 2117; 

Best Local Similarity 100.0%; Pred. No. 0.83; 

Matches 21; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 
Qy 1 gtaaaaaatcatgagccctgc 21 % _ ^ ^ n n 

i i is\-no -si^v 

Db 1304 GT AAAAAAT CAT GAG CC CT GC 1284 ^M 1 ^'^^^ 

RESULT 4 
AAV99923/C 

XO %\ ID AAV99923 standard; DNA; 2597 BP. 
$ ) $ 5 ) XX 

AC AAV99923; 
XX 

DT 12-MAY-1999 (first entry) 
XX 

DE DYT1 torsion dystonia gene (torsinA, clone DQ2) . 
XX 

KW Torsion dystonia; DYT1; torsinA; torsinB; DQ2; DQ1; - 
KW neurotransmission; movement disorder; chorea; tremor; rigidity; 
KW Huntingtons disease; Parkinsons disease; diagnosis; prognosis; 
KW prevention; treatment; neurology; neuropathology; ds . 
XX 

OS Homo sapiens. 
XX 

FH Key Location/Qualifiers 
FT CDS 568 . . 1566 

FT /*tag= a 

FT /product= TorsinA_protein 

XX ^ " 

PN<y W09857984-A2. J 
XX 

PD 23-DEC-1998. 
XX 

PF 19-JUN-1998; 98WO-US1277 6 . 
XX 

PR 18-JUN-1998; 98US-0099454 . 

PR 19-JUN-1997; 97US-005024 4 . 
XX 

PA (BREA/) BREAKEFIELD X. 

PA (OZEL/) OZELIUS L J. 
XX 

PI Breakefield X, Ozelius L J; 
XX 

DR WPI; 1999-080947/07. 




DR P-PSDB; AAW81055. 
XX 

PT New isolated torsion dystonia genes - used to develop products for 

PT the diagnosis, prognosis, prevention and treatment of torsion 

PT dystonia 
XX 

PS Claim 2; Page 94-97; 138pp; English. 
XX 

CC Movement disorders generally comprise some kind of aberrant 

CC neurotransmission. These often manifest themselves as 

CC uncontrollable body movements such as chorea in Huntington's 

CC disease, tremor and rigidity in Parkinson's disease and twisting 

CC contractions in torsion dystonia. Dys tonic syptoms can be 

CC secondary to neurological conditions but primary or torsion 

CC dystonia is characterised by a lack of other neurologic involvement 

CC and the absence of any distinct neuropathology. Clinical 

CC manifestations of torsion dystonia can affect many different body 

CC regions. Novel torsion dystonia genes, their polypeptide and 

CC protein products, recombinant nucleic acids comprising them, cells 

CC transformed by them or recombinant molecules in which they are 

CC contained, as well as antibody molecules directed against them, 

CC can be used to develop products for the diagnosis, prognosis, 

CC prevention and treatment of torsion dystonia. In particular, the 

CC torsin polypeptides can be used to treat torsion dystonia. This 

CC sequence encodes the torsion dystonia protein TorsinA and was 

CC isolated from human adult substantia nigra, hippocampus and 

CC frontal cortex. 

XX 

SQ Sequence 2597 BP; 652 A; 623 C; 656 G; 658 T; 8 other; 



31-59 



Query Match 100.0%; Score 21; DB 20; Length 2597; 

Best Local Similarity 100.0%; Pred. No. 0.86; 

Matches 21; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 gtaaaaaatcatgagccctgc 21 

I I I I I I I I I I I I I I I I I I i M Gfcl - lot £lb 

Db 18 4 0 GTAAAAAATCATGAGCCCTGC 1820 ^ 

{o°>l '1,11 SID -^l 

RESULT 5 HUt>-H*> 
AAS32785 

ID AAS32785 standard; DNA; 11853 BP. 
XX 

AC AAS32785; 
XX 

DT 17-DEC-2001 (first entry) 
XX 

DE Human genomic DNA for novel endocrine antigen, SEQ ID No 739. 
XX 

KW Human; endocrine antigen; ds; cytostatic; antiinf ertility ; antidiabetic; 

KW thyroid-active; adrenal-active; androgenic; gastric; gene therapy; 

KW antisense-therapy ; antibody; endocrine disorder; hormone imbalance; 

KW reproductive disorder; endocrine cancer; pancreatic disorder; 

KW diabetes mellitus; adrenal gland disorder; hirsutism; thyroid disorder; 

KW hyperthyroidism; hypothalamic disorder; vanishing testes syndrome. 

XX 




OS 


Homo sapiens . 




XX 
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(HUMA-) HUMAN GENOME SCI INC. 
Rosen CA, Barash SC, Ruben SM; 
WPI; 2001-457726/49. 

Isolated polypeptide for treating, preventing and prognosing disorders 
related to the endocrine system including endocrine disorders, 
reproductive disorders, and gastrointestinal disorders and also for 
testing and detection e.g. diagnosis - 

Disclosure; SEQ ID No 739; 558pp; English. 

The invention relates to cDNAs encoding novel human endocrine 
antigens or a fragment having biological activity, a domain, an epitope, 
full length protein, variant, allelic variant or a species homologue of 
the cDNA/antigen. The DNAs and polypeptides are useful for preventing, 
treating or ameliorating a medical condition when administered 
(e.g. by gene therapy or antisense-therapy) . Identifying mutations in 
the genes coding for the antigens is useful for diagnosing a pathological 
condition or a susceptibility to a pathological condition. The DNAs, 
antigens and antibodies raised against the antigens useful for treating, 
preventing and/ or prognosing disorders related to the endocrine system 
or hormone imbalance or reproductive disorders, cancers of endocrine 
tissues, disorders of the pancreas (e.g. diabetes mellitus), the adrenal 
glands (e.g. hirsutism), ovaries, the thyroid (e.g. hyperthyroidism), the 



CC hypothalamus and testes (e.g. vanishing testes syndrome), many examples 

CC of diseases and disorders are given in the specification. The present 

CC sequence is genomic DNA fragment form a gene encoding an endocrine 

CC antigen of the invention. 

CC Note: The sequence data for this patent did not form part 

CC of the printed specif ication, but was obtained in electronic 

CC format directly from WIPO at 

CC ftp . wipo . int/pub/published_pct_sequences . 

XX 

SQ Sequence 11853 BP; 3002 A; 3353 C; 2845 G; 2653 T; 0 other; 

Query Match 100.0%; Score 21; DB 22; Length 11853; 

Best Local Similarity 100.0%; Pred. No. 1; 

Matches 21; Conservative 0; Mismatches 0; Indels 0; Gaps 

Qy 1 gtaaaaaatcatgagccctgc 21 

I M M I I I I I I I 1 I 1 I I I I I I %00^ - \VP ~ SIP 3fc 

Db 1370 gtaaaaaatcatgagccctgc 1390 ~ 

RESULT 6 ^ 
AAD07609/C ^ . 

ID AAD07609 standard; cDNA; 546 BP. 0^ 
XX O 

AC AAD07609; 

xx <M 

DT 10-AUG-2001 (first entry) 2\ 
XX 

DE Human secreted protein-encoding gene 8 cDNA clone HAT DM4 6, SEQ ID NO: 49. 
XX 

KW Human; secreted protein; proliferative disorder; cancer; tumour; 

KW foetal abnormality; developmental abnormality; haematopoietic disorder; 

KW immune system disorder; AIDS; autoimmune disease; rheumatoid arthritis; 

KW inflammation; allergy; neurological disorder; Alzheimer's disease; 

KW Parkinson 1 s disease; cognitive disorder; schizophrenia; asthma; 

KW skin disorder; psoriasis; sepsis; diabetes; atherosclerosis; 

KW cardiovascular disorder; angiogenic disorder; kidney disorder; 

KW gastrointestinal disorder; pregnancy-related disorder; 

KW endocrine disorder; infection; wound healing; vulnerary; 

KW cell culture; chemotaxis; food additive; gene therapy; 

KW binding partner identification; ss. 

XX 

OS Homo sapiens. 
XX 

FH Key Location/ Qualifiers 

FT CDS 131.. 337 

FT /*tag= a 

FT /product= "Human secreted protein precursor" 

FT sig_peptide 131.. 184 

FT /*tag= b 

FT mat__peptide 185.. 334 

FT /*tag= c 

FT /product= "Mature human secreted protein" 

XX 

PN WO200132676-A1. 
XX 



PD 10-MAY-2001. 
XX 

PF 25-OCT-2000; 2 000WO-US2 9365 . 
XX 

PR 29-OCT-1999; 99US-0162237 . 

PR 21-JUL-2000; 2000US-0219666 . 
XX 

PA (HUMA-) HUMAN GENOME SCI INC. 
XX 

PI Ruben SM, Komatsoulis GA, Shi Y, Olsen HS, Soppet DR; 
XX 

DR WPI; 2001-328773/34. 

DR P-PSDB; AAE03090. 
XX 

PT Nucleic acids encoding 25 human secreted polypeptides, useful for 

PT preventing, diagnosing and/or treating e.g. Gaucher 1 s disease, 

PT Alzheimer's disease, Scimitar syndrome, Creutzfeldt- Jacob disease, 

PT diabetes mellitus and multiple sclerosis - 

XX 

PS Claim 1; Page 434; 546pp; English. 
XX 

CC AAD07571-AAD07645 represent cDNAs corresponding to 25 human secreted 

CC protein genes, and AAE03052-AAE03126 represent the proteins they encode. 

CC AAE03127-AAE03150 represent human secreted protein fragments. The genes 

CC and their corresponding secreted proteins are useful for preventing, 

CC treating or ameliorating medical conditions, e.g., by protein or gene 

CC therapy. Pathological conditions can be diagnosed by determining the 

CC amount of the new protein in a sample or by determining the presence of 

CC mutations in the new genes. Specific uses are described for each of the 

CC 25 genes, based on the tissues in which they are most highly expressed, 

CC and include developing products for the diagnosis or treatment of 

CC proliferative disorders, cancer, tumours, foetal and developmental 

CC abnormalities, haematopoietic disorders, diseases of the immune system, 

CC AIDS, autoimmune diseases (e.g., rheumatoid arthritis), inflammation, 

CC allergies, neurological disorders (e.g., Alzheimer's disease, 

CC Parkinson's disease), cognitive disorders, schizophrenia, asthma, 

CC skin disorders (e.g., psoriasis), sepsis, diabetes, atherosclerosis, 

CC cardiovascular disorders, angiogenic disorders, kidney disorders, 

CC gastrointestinal disorders, pregnancy-related disorders, endocrine 

CC disorders, and infections. The proteins can also be used to aid wound 

CC healing and epithelial cell proliferation, to prevent skin aging due to 

CC sunburn, to maintain organs before transplantation, for supporting cell 

CC culture of primary tissues, to regenerate tissues, to identify their 

CC cognate ligands or binding partners, and in chemotaxis, and can be used 

CC as a food additive or preservative to modify storage properties. 

CC Antibodies specific for a protein of the invention can be used in 

CC alleviating symptoms associated with the disorders mentioned above, and 

CC in diagnostic immunoassays e.g., radioimmunoassay or enzyme linked 

CC immunosorbent assay (ELISA) . The present sequence represents a human 

CC secreted protein-encoding cDNA of the invention. 

XX 

SQ Sequence 546 BP; 131 A; 120 C; 118 G; 173 T; 4 other; 



Query Match 80.0%; Score 16.8; DB 22; Length 546; 

Best Local Similarity 90.0%; Pred. No. 84; 

Matches 18; Conservative 0; Mismatches 2; Indels 0; Gaps 



2 taaaaaatcatgagccctgc 21 
I I II I I I I I I I I I I I I I I 
316 TAAATAAT CAT GAG CT CT GC 297 



